Skip to content

Commit

Permalink
add documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
lingyielia committed Oct 24, 2018
1 parent fc2c416 commit 90f3ae5
Show file tree
Hide file tree
Showing 10 changed files with 66 additions and 17 deletions.
Binary file modified fig/demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified fig/diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion src/cloudwatch/create_cloudwatch_rule.sh

This file was deleted.

10 changes: 10 additions & 0 deletions src/cloudwatch/run_cloudwatch_trigger.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# using aws cron expression,
# run target at 11a.m. (UTC time zone) daily
aws events put-rule \
--schedule-expression "cron(0 11 * * ? *)" \
--name RunDaily

aws events put-targets \
--rule RunDaily \
--targets "Id"="1",
"arn:aws:lambda:us-east-1:257685430371:function:lambda_check_api_status"
32 changes: 31 additions & 1 deletion src/emr/emr.sh
Original file line number Diff line number Diff line change
@@ -1 +1,31 @@
aws emr create-cluster --applications Name=Ganglia Name=Spark Name=Zeppelin --tags 'emr=' --ec2-attributes '{"KeyName":"lz1714-IAM-keypair","InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"subnet-aa6fed84","EmrManagedSlaveSecurityGroup":"sg-0800e273b14dd59b2","EmrManagedMasterSecurityGroup":"sg-043757d5f69d0a814"}' --service-role EMR_DefaultRole --enable-debugging --release-label emr-5.17.0 --log-uri 's3n://aws-logs-257685430371-us-east-1/elasticmapreduce/' --name 'data311in' --instance-groups '[{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m4.large","Name":"Core Instance Group"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m4.large","Name":"Master Instance Group"}]' --configurations '[{"Classification":"spark","Properties":{"maximizeResourceAllocation":"true"},"Configurations":[]}]' --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region us-east-1
# launch a EMR cluster with 1 master and 2 workers
aws emr create-cluster \
--applications Name=Ganglia Name=Spark Name=Zeppelin \
--tags 'emr=' \
--ec2-attributes '{"KeyName":"lz1714-IAM-keypair",
"InstanceProfile":"EMR_EC2_DefaultRole",
"SubnetId":"subnet-aa6fed84",
"EmrManagedSlaveSecurityGroup":"sg-0800e273b14dd59b2",
"EmrManagedMasterSecurityGroup":"sg-043757d5f69d0a814"}' \
--service-role EMR_DefaultRole \
--enable-debugging \
--release-label emr-5.17.0 \
--log-uri 's3n://aws-logs-257685430371-us-east-1/elasticmapreduce/' \
--name 'data311in' \
--instance-groups '[{"InstanceCount":2,
"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},
"VolumesPerInstance":1}]},
"InstanceGroupType":"CORE",
"InstanceType":"m4.large",
"Name":"Core Instance Group"},
{"InstanceCount":1,
"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},
"VolumesPerInstance":1}]},
"InstanceGroupType":"MASTER",
"InstanceType":"m4.large",
"Name":"Master Instance Group"}]' \
--configurations '[{"Classification":"spark",
"Properties":{"maximizeResourceAllocation":"true"},
"Configurations":[]}]' \
--scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
--region us-east-1
4 changes: 4 additions & 0 deletions src/emr/historical_dataclean.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
#
# This pyspark job read all historical data from s3
# Data can be stored into S3 or postgres in RDS after cleaning
#
from pyspark.sql.types import *

# define schema
Expand Down
2 changes: 2 additions & 0 deletions src/frontend/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
The tableau dashboard is embedded in the index.html
The website uses Flask as framework and hosted on Heroku.
15 changes: 0 additions & 15 deletions src/kinesis/kinesis_analytics.sql

This file was deleted.

12 changes: 12 additions & 0 deletions src/lambda/create_function.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# template for create a lambda function
# the deployment package is a zip file containing the actual
# lambda function and corresponding dependencies.
aws lambda create-function \
--region us-east-1 \
--function-name lambda_check_api_status \
--zip-file fileb://deployment-package.zip \
--role arn:aws:iam::account-id:role/lambda_basic_execution \
--handler checkApi/lambda_function.lambda_handler \
--runtime python3.6 \
--timeout 60 \
--memory-size 512
7 changes: 7 additions & 0 deletions src/redshift/create_cluster.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# create a redshift cluster
aws redshift create-cluster \
--node-type dw.hs1.xlarge \
--number-of-nodes 2 \
--master-username <> \
--master-user-password <> \
--cluster-identifier insightdw

0 comments on commit 90f3ae5

Please sign in to comment.