数据格式 aa=aa&bb=bb&cc=cc {"aa":"aa","bb":"bb"} 通用参数商定  date,数据产生时本地日期,格式样例 20180301、2018030112  action,行为名称,需要和数仓中建的对应表名保持一致  time_stamp,日志产生时的时间戳,单位:秒  gameid,游戏id 特殊字符处理 数据收集 打点数据直接记录到数据库中,收集方直接通过账号密码来拉取数据 kinesis 文档:https://aws.amazon.com/cn/kinesis/ 创建和管理流 https://docs.aws.amazon.com/zh_cn/streams/latest/dev/working-with-streams.html 用agent写入数据流 https://docs.aws.amazon.com/zh_cn/streams/latest/dev/writing-with-agents.html Kinesis Data Firehose https://docs.aws.amazon.com/zh_cn/firehose/latest/dev/what-is-this-service.html 客户端 客户端安装:https://aws.amazon.com/cn/kinesis/data-firehose/faqs/ 配置:/etc/aws-kinesis/agent.json { "awsAccessKeyId": "AKIATE7QHHAYUFAXMRZD", "awsSecretAccessKey": "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", "cloudwatch.emitMetrics": true, "kinesis.endpoint": "", "firehose.endpoint": "", "flows": [ { "filePattern": "/tmp/app.log*", "deliveryStream": "Minerva-games-firehose" } ] } 日志目录:/var/log/aws-kinesis-agent/aws-kinesis-agent.log fluentd 文档:https://docs.fluentd.org/ 安装: [yum安装](https://docs.fluentd.org/installation/install-by-rpm) [Ruby Gem安装](https://docs.fluentd.org/installation/install-by-gem) [安装ruby](https://www.ruby-lang.org/zh_cn/documentation/installation/#building-from-source) wget https://cache.ruby-lang.org/pub/ruby/2.7/ruby-2.7.6.tar.gz tar zxvf ruby-2.7.6.tar.gz ./configure make -j6 make install gem install fluentd --no-doc 使用 [Fluentd教程(附实例)](https://www.jianshu.com/p/e7c5f51f290b) Filebeat 用go语言开发,比较轻量级,现在基本上都用这个 logstash 过于臃肿,CPU和内存使用过大 如果想要收集和转换数据,不妨了解一下使用 Logstash 的数据处理管道 数据备份存储 s3 数据存储 redshift 添加字段 ALTER TABLE "public"."log" ADD COLUMN "name" varchar(255); ALTER TABLE "public"."log" ADD COLUMN "level" int4; load数据 copy player from 's3://minerva-bigdata/player.log' CREDENTIALS 'aws_access_key_id=AKIATE7QHHAYUFAXMRZD;aws_secret_access_key=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' delimiter ',' REGION AS 'us-east-1'; copy player from 's3://minerva-bigdata/player.log' CREDENTIALS 'aws_access_key_id=AKIATE7QHHAYUFAXMRZD;aws_secret_access_key=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' json 'auto'; copy gold_consume from 's3://minerva-bigdata/gold_consume/1003' CREDENTIALS 'aws_access_key_id=AKIATE7QHHAYUFAXMRZD;aws_secret_access_key=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' json 'auto'; Athena 文档:https://docs.aws.amazon.com/zh_cn/athena/latest/ug/what-is.html 交互式查询服务,可以使用标准SQL分析S3中的数据 是否提供编程语言接口? 按查询付费 数据查看 quicksight gm后台 aws服务从入门到精通| Amazon Athena操作 https://www.jianshu.com/p/e412bb96c0ca 详解日志采集工具--Logstash、Filebeat、Fluentd、Logagent对比 https://developer.51cto.com/article/595529.html Kinesis Firehose to put data into multiple Redshift tables 这个好像是不行,见下方官方文档 https://aws.amazon.com/cn/kinesis/data-firehose/faqs/ 数据如果定时的copy到redshit中 使用Amazon Kinesis Data Analytics进行分流 https://docs.aws.amazon.com/zh_cn/kinesisanalytics/latest/dev/app-tworecordtypes.html