Articles

Hadoop - 探討RunJar

通常要執行一個Hadoop Job時，會透過下述的指令來達成：

${HADOOP_HOME}/bin/hadoop jar your.jar mainClass args

當送出上述指令之後，透過「jps」指令可以觀察到有一個「org.apache.hadoop.util.RunJar」的程式正在執行：

19141 RunJar

而該「org.apache.hadoop.util.RunJar」的程式就是透過「${HADOOP_HOME}/bin/hadoop」shell來執行對應的Command「jar」，並啟動「org.apache.hadoop.util.RunJar」來進行Hadoop Job的第一步。

hadoop shell (lines:228-229)

elif [ "$COMMAND" = "jar" ] ; then
  CLASS=org.apache.hadoop.util.RunJar

從main method開始來看，它會從你所執行的「your.jar」來試著取得manifest的Main-Class屬性用來當作mainClassName，如果沒有指定的話就從參數取得。

RunJar.java (lines:94-107)

Manifest manifest = jarFile.getManifest();
    if (manifest != null) {
      mainClassName = manifest.getMainAttributes().getValue("Main-Class");
    }
    jarFile.close();

    if (mainClassName == null) {
      if (args.length < 2) {
        System.err.println(usage);
        System.exit(-1);
      }
      mainClassName = args[firstArg++];
    }
    mainClassName = mainClassName.replaceAll("/", ".");

接著該程式會將「your.jar」解壓縮在一個暫存的目錄裡面，該目錄的位置會取決於「hadoop.tmp.dir」的設定，從「${HAOOP_HOME}/src/core/core-default.xml」可以得知該設定的預設值為「/tmp/hadoop-${user.name}」，所以從下述的原始碼可得知，它一開始會試著建立「/tmp/hadoop-${user.name}」目錄(通常只有第一次執行時)，然後再透過「File.createTempFile()」方法來建立一個「hadoop-unjar*」的暫存目錄，所以「your.jar」解壓縮後的class檔都會放在此目錄裡面，當執行結束之後也會一併刪除該目錄。

RunJar.java (lines:109-132)

File tmpDir = new File(new Configuration().get("hadoop.tmp.dir"));
    tmpDir.mkdirs();
    if (!tmpDir.isDirectory()) { 
      System.err.println("Mkdirs failed to create " + tmpDir);
      System.exit(-1);
    }
    final File workDir = File.createTempFile("hadoop-unjar", "", tmpDir);
    workDir.delete();
    workDir.mkdirs();
    if (!workDir.isDirectory()) {
      System.err.println("Mkdirs failed to create " + workDir);
      System.exit(-1);
    }
    
    Runtime.getRuntime().addShutdownHook(new Thread() {
        public void run() {
          try {
            FileUtil.fullyDelete(workDir);
          } catch (IOException e) {
          }
        }
      });
     
    unJar(file, workDir);

最後才會透過Reflection機制來達成動態載入「your.jar」的mainClass。

RunJar.java (lines:134-159)

ArrayList<URL> classPath = new ArrayList<URL>();
    classPath.add(new File(workDir+"/").toURL());
    classPath.add(file.toURL());
    classPath.add(new File(workDir, "classes/").toURL());
    File[] libs = new File(workDir, "lib").listFiles();
    if (libs != null) {
      for (int i = 0; i < libs.length; i++) {
        classPath.add(libs[i].toURL());
      }
    }
    
    ClassLoader loader =
      new URLClassLoader(classPath.toArray(new URL[0]));

    Thread.currentThread().setContextClassLoader(loader);
    Class<?> mainClass = Class.forName(mainClassName, true, loader);
    Method main = mainClass.getMethod("main", new Class[] {
      Array.newInstance(String.class, 0).getClass()
    });
    String[] newArgs = Arrays.asList(args)
      .subList(firstArg, args.length).toArray(new String[0]);
    try {
      main.invoke(null, new Object[] { newArgs });
    } catch (InvocationTargetException e) {
      throw e.getTargetException();
    }

2009-12-16 00:09:37 | Add Comment

談談Playfish

In General

上面這張圖是Playfish所設計Bowling Buddies的遊戲畫面，然而，本文所要談的重點卻不在於遊戲本身，而是可以注意到上圖下方有一位「Amr Awadallah」(該作者的Blog)居然也玩過此遊戲!! Ok, 那位仁兄究竟是何許人也？是的~ 他是Cloudera公司的CTO，也就是提供Apache Hadoop相關服務的公司，到這裡為止我相信有玩過這款遊戲的人應該都知道，這款社交遊戲(Social Game)要玩到「190」分也不是一開始就能達到的，所以從另一個角度再來看Playfish公司，為何連「Amr Awadallah」都會玩？Playfish是什麼樣的公司？

從「Playfish - wiki」可以得知，Playfish是在2007年10月由Kristian Segerstrale、 Sebastien de Halleux、Sami Lababidi和Shukri Shammas四位共同創辦的，主要是開發一些在Social Networking平台上的社交遊戲，而該公司在2009年11月9日被「Electronic Arts」公司以2.75億美元所收購(We’re combining forces with EA!)，為何短短兩年能有如此大的變化？我想這四位創辦人當初應該也是沒預料到的(純粹猜測)，或許可以從Mr. 6網站在11月11日所發表的一篇「AdMob與Playfish被高價併購，變動時代中兩個32歲創業家所嶄露的成功特質」文章來參考與借鏡。

Mr.6在該文章中歸納出的下列四點結論：

一、鎖定一個自己喜歡的事情，就做那個做到永久。「做它做到死，在哪裡都可以。」

二、再找一個現在最新的平台、趨勢，在這個之上努力做第一點那件喜歡的事情。該跟風的時候就跟風，無論跟什麼風，核心依然是同一件事。

三、不見得要到矽谷做，只要第二點的平台選對，在哪裡都可以很成功。

四、給自己無限次練習：上面的幾個點，是可以一年又一年的重覆去做的，無論是Omar或Kristian，在他們賺翻以前都曾經做另一個東西，因為做了那些東西，如果沒有做那些小點子，怎麼會「試」到網路上目前真正的「問題」？大致來看，我們的頭腦不會比人家聰明，唯一能「聰明過人」(out- wit)的方法，就是看到別人沒看到的東西，然後用一點點大腦，就可以完成它。當然還要更多一點點大腦在「執行」上面，但基本上只要看到別人沒看到的東西，就贏了一大半，唯一看到的方法就是要撥開那些遮簾，除了往前衝之外別無它法來撥開那些遮簾。

再從另一個技術的角度來看，為何短短兩年的公司如何負擔如此龐大的網路流量、儲存空間和電腦設備？(還沒加上電費等)除非它一開始就有大量的資金投入! 不過「好像」也沒有，或許筆者可以這麼說，因為該公司用了「雲端」服務(Playfish Case Study: Amazon Web Services)，上述所提及的流量、空間和電腦設備等完全由Amazon所提供的服務一手包辦! 全部都用租的，以用多少算多少來計費，就算機器不夠負載時也只要上網增加即可，而當然也不用負擔機器的耗損，所以整個團隊可以「專注」在一件事，就是將全部的心力投入在完成打造一個好的作品，我想這也是他們成功的原因之一(成功的定義取決於每個人看法而有所不同，這裡指的是該作品能做到讓全世界的人都來使用)，而未來呢？誰會是下一個Playfish？筆者不曉得~ 但，有一個結論：「能有一個團隊有向心力地在共同完成一件事是幸福的!!」

2009.12.13 updated

從Playfish官方的招募網站可以得知，該公司得到Accel創投的支持並以「400萬美金」所成立的。

．《Restaurant City》換老闆美商藝電收購Playfish 著眼新興遊戲領域

2009-12-13 01:30:07 | Add Comment

AS3 - Timeline_N

In Flash, ActionScript 3.0

在上一篇「AS3在Document Class和Timeline」曾探討過將AS3程式寫在時間軸的時候，Flash編譯器會自動產生了一個Dynamic Class「MainTimeline」，那如果在Flash IDE建一個「MovieClip」，並在該元件的某個影格內加上「stop()」程式，而經由Flash CS4編譯器處理後會如何呢？

Decompiling:

package test_fla
{
    import flash.display.*;

    dynamic public class Timeline_1 extends MovieClip
    {
        public function Timeline_1()
        {
            addFrameScript(61, this.frame62);
            return;
        }
        function frame62()
        {
            stop();
            return;
        }
    }
}

如同上述的程式碼，在主時間軸的影格寫下AS3程式的話，Flash會產生「MainTimeline」Dynamic Class，若是在任一「MovieClip」內寫AS3程式，則會產生「Timeline_N」的Dynamic Class，而「N」整數就取決於Flash編譯器所決定的，不過我們也可以透過「getQualifiedClassName(this);」來取得該元件的類別名稱(如：Timeline_1)，如此，既然知道該元件對應的類別是否就能如法泡製的複製一份呢？答案是肯定的，如下：

var rect:MovieClip = new Timeline_1();
rect.x = 100;
addChild(rect);

另外值得注意的是，就算只是寫下註解的字元「//」也會產生如下的程式碼：

package test_fla
{
    import flash.display.*;

    dynamic public class Timeline_1 extends MovieClip
    {

        public function Timeline_1()
        {
            addFrameScript(61, this.frame62);
            return;
        }
        function frame62()
        {
            return;
        }
    }
}

所以可見Flash編譯器在這部份未來仍然有值得改善的空間!

2009-12-10 23:25:15 | Comments (1)

WordCount - HBase 0.20.x

In Hadoop, HBase

本文是一個簡單的WordCount程式，經由MapReduce的處理之後直接輸出到HBase，實作的範例如下：

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

public class WordCountHBase
{
    public static class Map extends Mapper<LongWritable,Text,Text, IntWritable>
    {
        private IntWritable i = new IntWritable(1);
        @Override
        public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException
        {
            String s[] = value.toString().trim().split(" ");
            for( String m : s)
            {
                context.write(new Text(m), i);
            }
        }
    }
    public static class Reduce extends TableReducer<Text, IntWritable, NullWritable>
    {
        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
        {
            int sum = 0;
            for(IntWritable i : values)
            {
                sum += i.get();
            }
           
            Put put = new Put(Bytes.toBytes(key.toString()));
            put.add(Bytes.toBytes("content"), Bytes.toBytes("count"), Bytes.toBytes(String.valueOf(sum)));
            context.write(NullWritable.get(), put);
        }
    }
    public static void createHBaseTable(String tablename)throws IOException
    {
        HTableDescriptor htd = new HTableDescriptor(tablename);
        HColumnDescriptor col = new HColumnDescriptor("content:");
        htd.addFamily(col);
       
        HBaseConfiguration config = new HBaseConfiguration();
        HBaseAdmin admin = new HBaseAdmin(config);
        if(admin.tableExists(tablename))
        {
            admin.disableTable(tablename);
            admin.deleteTable(tablename);
        }
       
        System.out.println("create new table: " + tablename);
        admin.createTable(htd);
    }
   
    public static void main(String args[]) throws Exception
    {
        String tablename = "wordcount";
       
        Configuration conf = new Configuration();
        conf.set(TableOutputFormat.OUTPUT_TABLE, tablename);
        createHBaseTable(tablename);

        String input = args[0];
        Job job = new Job(conf, "WordCount table with " + input);
       
        job.setJarByClass(WordCountHBase.class);
        job.setNumReduceTasks(3);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
       
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
       
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TableOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(input));
       
        System.exit(job.waitForCompletion(true)?0:1);
    }
}

從上述程式可以知道Reduce是直接繼承於TableReducer<KEYIN,VALUEIN,KEYOUT>，不過就如同API的說明，KEYOUT在TableOutputFormat當中是被忽略的，而且VALUEOUT也只能是Put或Delete，可以從下述的原始碼得知：

TableOutputFormat.java

public void write(KEY key, Writable value) throws IOException
{
	if (value instanceof Put)
		this.table.put(new Put((Put) value));
	else if (value instanceof Delete)
		this.table.delete(new Delete((Delete) value));
	else
		throw new IOException("Pass a Delete or a Put");
}

至於該輸出至哪一個Table，則必須設置「TableOutputFormat.OUTPUT_TABLE」的組態設定，也可以自行設置「hbase.mapred.outputtable」。

TableOutputFormat.java

public static final String OUTPUT_TABLE = "hbase.mapred.outputtable";

2009-12-08 21:55:33 | Comments (2)

AS3在Document Class和Timeline

In Flash, ActionScript 3.0

在Flash IDE的開發環境下，我們不僅可以透過Document Class的方式來寫AS3程式，另外也可以透過時間軸(Timeline)的方式來達成，甚至還可以混合這兩種方式，那究竟寫在Document Class和Timeline兩者的寫法有何差異，底下各別來探討：

Document Class - Hello

Hello.as

package
{
	import flash.display.*;
	
	public class Hello extends MovieClip
	{
		public function Hello()
		{
			trace("Hello");
		}
	}
}

直接在Document Class中設為「Hello」並輸出swf檔案。

Decompiling:

package 
{
    import flash.display.*;

    public class Hello extends MovieClip
    {
        public function Hello()
        {
            trace("Hello");
            return;
        }
    }
}

從結果上來看，程式的結構上並沒有太大的變化，接下來看時間軸的作法。

Timeline - Hello

直接在時間軸Frame[0]寫下：

trace("Hello");

並直接發佈成swf檔案。

Decompiling:

package hello_fla
{
    import flash.display.*;

    dynamic public class MainTimeline extends MovieClip
    {
        public function MainTimeline()
        {
            addFrameScript(0, this.frame1);
            return;
        }
        function frame1()
        {
            trace("Hello");
            return;
        }
    }
}

經由時間軸的寫法可以發現，只有一行「trace("Hello");」程式完全被重新包裝，Flash編譯器自動產生了一個Dynamic Class「MainTimeline」來達成，而且原先的「trace("Hello");」程式並不是被寫在MainTimeline的Constructor之中，而是透過「addFrameScript」函式來明確地指定「Frame 0」位置來達成，那如果混合Document Class和Timeline的作法又是如何呢？

Document Class & Timeline - Hello

將上述「Hello.as」直接拿來使用，並在時間軸Frame[0]的位置加上「trace("Hello2");」來測試。

Decompiling:

package 
{
    import flash.display.*;

    public class Hello extends MovieClip
    {
        public function Hello()
        {
            addFrameScript(0, this.frame1);
            trace("Hello");
            return;
        }
        function frame1()
        {
            trace("Hello2");
            return;
        }
    }
}

透過Document Class和Timeline混合的作法可以發現，Flash編譯器會將Timeline上的程式加到Document Class的Constructor之中，所以如果從Runtime的「執行順序」上來看(非程式順序)，該程式即會印出：

Hello
Hello2

2009-12-07 22:16:28 | Add Comment

Next Posts~:::~Previous Posts

Articles

Hadoop - 探討RunJar

談談Playfish

AS3 - Timeline_N

WordCount - HBase 0.20.x

AS3在Document Class和Timeline

::: 搜尋 :::

::: 分類 :::

::: 最新文章 :::

::: 最新回應 :::

::: 訂閱 :::