mp4動画ファイルのトランスコードを Python for Lambda で自動化

この投稿は「今年もやるよ！AWS Lambda縛り Advent Calendar 2015 - Qiita」の 3日目の記事です。

f:id:akiyoko:20151203083001j:plain

1、2、3、ラムダーーーーーーー！！

12/3 の記事ということではしゃいでしまいました。
とっとと始めます。。

はじめに

これまで、「Boto3 で Elastic Transcoder を操作する方法」「Boto3 で Amazon SNS を操作する方法」「Python for Lambda (Python Functions) の基本操作」を試してきました。

＜過去記事＞
akiyoko.hatenablog.jp

akiyoko.hatenablog.jp

今回は、その総まとめとして、S3 への mp4ファイルのアップロードイベントを AWS Lambda で自動検知し、 Amazon Elastic Transcoder を起動して mp4ファイルを HLS形式の動画ファイルにトランスコードする、という仕組みを自動化してみます。

やりたいこと

S3 に mp4ファイルがアップロードされたことを AWS Lambda で検知し、Amazon Elastic Transcoder を起動して mp4ファイルを HLS形式の動画ファイルにトランスコードする

概要図を描こうかと思ったのですが、 AWSのスライドにそのままの図がありました。

【AWS初心者向けWebinar】AWSから始める動画配信 from Amazon Web Services Japan

利用手順

今回の手順の概要は、以下の通りです。

1. IAM で Role を作成
2. S3 Bucket を用意する
3. Lambda Function を作成
4. 実環境で試す

1. IAM で Role を作成

まずは、IAM で Lambda を実行するための Role を作成していきます。

IAM の Management Console から「Create New Role」をクリックします。
f:id:akiyoko:20151107203504p:plain

「lambda_auto_transcoder_role」という名前で Role を作成します（名前は任意）。
f:id:akiyoko:20151107203709p:plain

Role Type に、「AWS Lambda」を選択します。
ここで、「AWS Lambda」を選択しておかないと、Lambda 側の Role のプルダウンメニューにここで作成した Role が出てこなくなるので要注意です。
f:id:akiyoko:20151107204035p:plain

Managed Policy は付与せず、Inline Policy を直接付与したいので、ここでは何もせず次に進みます。
f:id:akiyoko:20151107204117p:plain

f:id:akiyoko:20151107204157p:plain

Inline Policy を付与します。
先に作成した「lambda_auto_transcoder_role」を選択します。
f:id:akiyoko:20151107204320p:plain

「Inline Policies」の「click here」をクリックします。
f:id:akiyoko:20151107204359p:plain

Custom Policy を選択します。
f:id:akiyoko:20151107204438p:plain

Inline Policy を「lambda_auto_transcoder_role_policy」という名前（任意）で、以下のように設定します。

なお、以下のポリシーは、AWS Lambda（というか CloudWatch）、S3、Amazon Elastic Transcoder、および Amazon SNS への、今回使う機能でなるべく最小限のアクセスを許可したものになります。

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:PassRole"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:Put*",
                "s3:Get*",
                "s3:*MultipartUpload*"
            ],
            "Resource": "arn:aws:s3:::*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "elastictranscoder:*"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sns:CreateTopic",
                "sns:Publish"
            ],
            "Resource": "*"
        }
    ]
}

f:id:akiyoko:20151107204525p:plain

ここで罠（？）が。
これまでの設定だけだと、AWS Lambda から Amazon Elastic Transcoder を呼び出して実行しようとしたときに、

arn:aws:iam::xxxxxxxxxxxx:role/lambda_auto_transcoder_role either does not exist or has not granted Amazon Elastic Transcoder the sts:AssumeRole permission.

といったエラーが出てしまいます。

AWS Security Token Service（AWS STS）の AssumeRole で Amazon Elastic Transcoder を認可する必要があるということなのですが、簡単に言うと、Role の信頼ポリシーに Amazon Elastic Transcoder を追加する必要があるのです。

＜参考＞
IAMロール徹底理解〜 AssumeRoleの正体｜ Developers.IO

を参考に、IAM Role の [Trust Relationships] タブから、信頼ポリシーを編集していきます。

「Edit Trust Relationship」をクリックします。
f:id:akiyoko:20151107205325p:plain

以下のように「elastictranscoder.amazonaws.com」を追加します。

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "lambda.amazonaws.com",
          "elastictranscoder.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

f:id:akiyoko:20151107205359p:plain

最終的に、「lambda_auto_transcoder_role」はこのようになります。
f:id:akiyoko:20151107205445p:plain
f:id:akiyoko:20151107205510p:plain

2. S3 Bucket を用意する

Amazon Elastic Transcoder の入力バケット・出力バケットをそれぞれ用意しておきます。

入力バケット

入力バケットは「lambda-transcoder-in」とします。

Bucket Policy は以下のように設定します。

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::xxxxxxxxxxxx:role/lambda_auto_transcoder_role"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::lambda-transcoder-in/*"
        }
    ]
}

（AWS Account ID は、「xxxxxxxxxxxx」と表記しています。）

f:id:akiyoko:20151107205745p:plain

出力バケット

出力バケットは「lambda-transcoder-out」とします。

Bucket Policy は以下のように設定します。

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::xxxxxxxxxxxx:role/lambda_auto_transcoder_role"
            },
            "Action": [
                "s3:Put*",
                "s3:*MultipartUpload*"
            ],
            "Resource": "arn:aws:s3:::lambda-transcoder-out/*"
        }
    ]
}

f:id:akiyoko:20151107205813p:plain

3. Lambda Function を作成

Python Function を作成していきます。

f:id:akiyoko:20151107205849p:plain

Blueprint（テンプレート）は（後で Event を変更できるので）何でもよいのですが、「s3-get-object-python」を選択しておきます。
f:id:akiyoko:20151107205911p:plain

Event source を以下のように設定します。

項目	設定例
Event source type	S3
Bucket	lambda-transcoder-in
Event type	Object Created (All)
Prefix	-
Suffix	mp4

f:id:akiyoko:20151107205949p:plain

Function Nameは「autoTranscoder」、Description（説明）は無しで設定します。
f:id:akiyoko:20151107210411p:plain

Role には、1. で作成した「lambda_auto_transcoder_role」を選択します。

また、実際の処理には 2200ms ほど必要なので、念のため、Advanced settingsで、Timeout を 3秒から 10秒に変更しておきます。
f:id:akiyoko:20151107210729p:plain

Python Function は以下をコピペします。

import boto3
from botocore.client import ClientError
import json
import urllib

REGION_NAME = 'ap-northeast-1'
TRANSCODER_ROLE_NAME = 'lambda_auto_transcoder_role'
PIPELINE_NAME = 'HLS Transcoder'
OUT_BUCKET_NAME = 'lambda-transcoder-out'
COMPLETE_TOPIC_NAME = 'test-complete'

print('Loading function')

s3 = boto3.resource('s3')
iam = boto3.resource('iam')
sns = boto3.resource('sns', REGION_NAME)
transcoder = boto3.client('elastictranscoder', REGION_NAME)


def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))

    # Get ARN
    complete_topic_arn = sns.create_topic(Name=COMPLETE_TOPIC_NAME).arn
    transcoder_role_arn = iam.Role(TRANSCODER_ROLE_NAME).arn

    # Get the object from the event
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')
    print("bucket={}, key={}".format(bucket, key))
    try:
        obj = s3.Object(bucket, key)
    except Exception as e:
        print(e)
        print("Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.".format(key, bucket))
        # Publish a message
        sns.Topic(complete_topic_arn).publish(
            Subject="Error!",
            Message="Failed to get object from S3. bucket={}, key={}, {}".format(bucket, key, e),
        )
        raise e

    # Delete inactive pipelines
    pipeline_ids = [pipeline['Id'] for pipeline in transcoder.list_pipelines()['Pipelines'] if pipeline['Name'] == PIPELINE_NAME]
    for pipeline_id in pipeline_ids:
        try:
            response = transcoder.delete_pipeline(Id=pipeline_id)
            print("Delete a transcoder pipeline. pipeline_id={}".format(pipeline_id))
            print("response={}".format(response))
        except Exception as e:
            # Raise nothing
            print("Failed to delete a transcoder pipeline. pipeline_id={}".format(pipeline_id))
            print(e)

    # Create a pipeline
    try:
        response = transcoder.create_pipeline(
            Name=PIPELINE_NAME,
            InputBucket=bucket,
            OutputBucket=OUT_BUCKET_NAME,
            Role=transcoder_role_arn,
            Notifications={
                'Progressing': '',
                'Completed': complete_topic_arn,
                'Warning': '',
                'Error': ''
            },
        )
        pipeline_id = response['Pipeline']['Id']
        print("Create a transcoder pipeline. pipeline_id={}".format(pipeline_id))
        print("response={}".format(response))
    except Exception as e:
        print("Failed to create a transcoder pipeline.")
        print(e)
        # Publish a message
        sns.Topic(complete_topic_arn).publish(
            Subject="Error!",
            Message="Failed to create a transcoder pipeline. bucket={}, key={}, {}".format(bucket, key, e),
        )
        raise e

    # Create a job
    try:
        job = transcoder.create_job(
            PipelineId=pipeline_id,
            Input={
                'Key': key,
                'FrameRate': 'auto',
                'Resolution': 'auto',
                'AspectRatio': 'auto',
                'Interlaced': 'auto',
                'Container': 'auto',
            },
            Outputs=[
                {
                    'Key': 'HLS/1M/{}'.format('.'.join(key.split('.')[:-1])),
                    'PresetId': '1351620000001-200030',  # System preset: HLS 1M
                    'SegmentDuration': '10',
                },
            ],
        )
        job_id = job['Job']['Id']
        print("Create a transcoder job. job_id={}".format(job_id))
        print("job={}".format(job))
    except Exception as e:
        print("Failed to create a transcoder job. pipeline_id={}".format(pipeline_id))
        print(e)
        # Publish a message
        sns.Topic(complete_topic_arn).publish(
            Subject="Error!",
            Message="Failed to create transcoder job. pipeline_id={}, {}".format(pipeline_id, e),
        )
        raise e

    return "Success"

Pipeline は、デフォルトで合計4つまでしか保持できないので、アクティブになっていない Pipeline は事前に削除しておくことにしました。

ちなみに、以下の Boto3 の API を使用しました。

Event source に「Enable now」を選択し、「Create Function」をクリックすると、Lamdba Function の作成は完了です。
f:id:akiyoko:20151107211304p:plain

f:id:akiyoko:20151107211335p:plain

テストしてみます。
f:id:akiyoko:20151107211551p:plain

f:id:akiyoko:20151107211612p:plain

ここで少しハマりました。。
テストを実行すると、以下のようなエラーが出ることが何度かありました。

Log output

START RequestId: 66b929e2-8510-11e5-b2c3-99a74c2c1765 Version: $LATEST
An error occurred (InvalidClientTokenId) when calling the CreateTopic operation: The security token included in the request is invalid.: ClientError
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 24, in lambda_handler
    complete_topic_arn = sns.create_topic(Name=COMPLETE_TOPIC_NAME).arn
  File "/var/runtime/boto3/resources/factory.py", line 394, in do_action
    response = action(self, *args, **kwargs)
  File "/var/runtime/boto3/resources/action.py", line 77, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/var/runtime/botocore/client.py", line 310, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/runtime/botocore/client.py", line 395, in _make_api_call
    raise ClientError(parsed_response, operation_name)
ClientError: An error occurred (InvalidClientTokenId) when calling the CreateTopic operation: The security token included in the request is invalid.

原因は、Role の設定が AWS内に浸透していなかったということらしく、少し前に「lambda_auto_transcoder_role」という名前で IAM Role を作っていたのですが、それを削除して同じ名前で Role を作り直して Lambda Function を実行したので、このようなエラーが出たのだと推測されます。

私の場合は、5時間ほどでエラーが出なくなりました。

＜参考＞
InvalidClientTokenId => The security token included in the request is invalid · Issue #21 · fog/fog-aws · GitHub

4. 実環境で試す

S3 の入力バケットに mp4ファイルをアップロードしてみます。
f:id:akiyoko:20151107212149p:plain

しばらくすると、SNS からメールが通知が来ました。
state が「COMPLETE」になっています。

＜メール＞

Amazon Elastic Transcoder has finished transcoding job 1446894541726-3waslu.

{
  "state" : "COMPLETED",
  "version" : "2012-09-25",
  "jobId" : "1446894541726-3waslu",
  "pipelineId" : "1446894541305-zo2lwm",
  "input" : {
    "key" : "D0002022073_00000/sample.mp4",
    "frameRate" : "auto",
    "resolution" : "auto",
    "aspectRatio" : "auto",
    "interlaced" : "auto",
    "container" : "auto"
  },
  "outputs" : [ {
    "id" : "1",
    "presetId" : "1351620000001-200030",
    "key" : "HLS/1M/D0002022073_00000/sample",
    "segmentDuration" : 10.0,
    "status" : "Complete",
    "statusDetail" : "Some individual segment files for this output have a higher bit rate than the average bit rate of the transcoded media. Playlists including this output will record a higher bit rate than the rate specified by the preset.",
    "duration" : 40,
    "width" : 640,
    "height" : 360
  } ]
}

S3 の出力バケットにも、トランスコードされた HLSファイルが配置されています。

＜S3 出力バケット＞
f:id:akiyoko:20151107212410p:plain

CloudWatch にもログが出力されていました。

＜CloudWatch ログ＞

Loading function 
START RequestId: f09505b7-853f-11e5-9a79-e33dc551d93f Version: $LATEST 
bucket=lambda-transcoder-in, key=D0002022073_00000/sample.mp4 
Delete a transcoder pipeline. pipeline_id=1446890654281-yttx9x 
response={'ResponseMetadata': {'HTTPStatusCode': 202, 'RequestId': 'f2ff00f2-853f-11e5-8fdf-67ed42920387'}} 
Create a transcoder pipeline. pipeline_id=1446894541305-zo2lwm 
response={u'Pipeline': {u'Status': u'Active', u'ContentConfig': {u'Bucket': u'lambda-transcoder-out', u'Permissions': []}, u'Name': u'HLS Transcoder', u'ThumbnailConfig': {u'Bucket': u'lambda-transcoder-out', u'Permissions': []}, u'Notifications': {u'Completed': u'arn:aws:sns:ap-northeast-1:xxxxxxxxxxxx:test-complete', u'Warning': u'', u'Progressing': u'', u'Error': u''}, u'Role': u'arn:aws:iam::xxxxxxxxxxxx:role/lambda_auto_transcoder_role', u'InputBucket': u'lambda-transcoder-in', u'OutputBucket': u'lambda-transcoder-out', u'Id': u'1446894541305-zo2lwm', u'Arn': u'arn:aws:elastictranscoder:ap-northeast-1:xxxxxxxxxxxx:pipeline/1446894541305-zo2lwm'}, 'ResponseMetadata': {'HTTPStatusCode': 201, 'RequestId': 'f31d5e2d-853f-11e5-a4f1-c5fc4d6a4741'}} 
Create a transcoder job. job_id=1446894541726-3waslu 
job={u'Job': {u'Status': u'Submitted', u'Playlists': [], u'Outputs': [{u'Status': u'Submitted', u'PresetId': u'1351620000001-200030', u'Watermarks': [], u'SegmentDuration': u'10.0', u'Key': u'HLS/1M/D0002022073_00000/sample', u'Id': u'1'}], u'PipelineId': u'1446894541305-zo2lwm', u'Output': {u'Status': u'Submitted', u'PresetId': u'1351620000001-200030', u'Watermarks': [], u'SegmentDuration': u'10.0', u'Key': u'HLS/1M/D0002022073_00000/sample', u'Id': u'1'}, u'Timing': {u'SubmitTimeMillis': 1446894541789}, u'Input': {u'Container': u'auto', u'FrameRate': u'auto', u'Key': u'D0002022073_00000/sample.mp4', u'AspectRatio': u'auto', u'Resolution': u'auto', u'Interlaced': u'auto'}, u'Id': u'1446894541726-3waslu', u'Arn': u'arn:aws:elastictranscoder:ap-northeast-1:xxxxxxxxxxxx:job/1446894541726-3waslu'}, 'ResponseMetadata': {'HTTPStatusCode': 201, 'RequestId': 'f35f4936-853f-11e5-8c93-9d0b85e88cf1'}} 
END RequestId: f09505b7-853f-11e5-9a79-e33dc551d93f 
REPORT RequestId: f09505b7-853f-11e5-9a79-e33dc551d93f  Duration: 2188.52 ms    Billed Duration: 2200 ms Memory Size: 128 MB    Max Memory Used: 58 MB

まとめ

ここまで AWS Lambda を使ってみての感想ですが、やはり IAM 絡みの設定が少しややこしいな、という印象があります（そもそも IAM に対する根本的な理解が足りていないということなのかもしれません）。それさえクリアできれば、AWS Lambda が強力な武器になることは間違いないでしょう。

明日は、crifff さんの 4日目の記事です。よろしくお願いします。