Deploying Data Science with Docker and AWS

  • Published on
    14-Apr-2017

  • View
    220

  • Download
    3

Transcript

Deploying Data Science with Docker and AWSAudience: Cambridge AWS Meetup Group Presenter: Matt McDonnell, Data Scientist at MetailDate: 9th June 2016ContextLots of event stream dataMany AWS componentsOutputs:- Business Intelligence- Bespoke Analysis- Productionised ScienceWhat?Goal: Moving laptop analyses onto a serverTurn :run_analysis.sh analysis script retrieves data from DB, Looker, web, etc. runs analysis outputs results as csv, png, etc. to local hard disk Into :Automated process running on a serverWhy? Production scheduled task e.g. Firm Wide Metrics daily processing Make use of more powerful Amazon Web Services (AWS) cloud resources for large scale analysis Ease of deployment for Data Science analysts Build consistent development environmentHow? Containerize applications and runtime using Docker to produce images Store images on AWS Elastic Container Registry (ECR) Run images either locally, or Amazon Elastic Container Service (ECS) Use AWS Lambda functions to trigger scheduled tasks (or react to events)What is Docker?Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in. -- https://www.docker.com/what-dockerPublic code: store Dockerfile on GitHub, use Travis to automatically build image on DockerHubPrivate code: private Dockerfile, build locally, push image to AWS Elastic Container Registryhttps://www.docker.com/what-dockerExample application: retrieve market dataPyAnalysisApplication code built on PCR image https://github.com/mattmcd/PyAnalysisPCR: Python Component Runtime Base Docker imagehttps://github.com/mattmcd/PCRhttps://github.com/mattmcd/PyAnalysishttps://github.com/mattmcd/PCRWhere? Amazon Web Services Cloud Elastic Container Service (ECS) Defines the task that runs the container Runs tasks on a cluster of EC2 nodes EC2 instance set up to act as node Needs to be an AWS ECS optimized AMIhttps://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html Needs an IAM Role that has: AmazonEC2ContainerServiceforEC2Role policy attached Policies to allow access to any AWS resources needed e.g. S3 Lambda function to trigger ECS task cron equivalent by using CloudWatch scheduled eventshttps://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.htmlEC2 Instance Security GroupEC2 instance used by ECS can be locked down no need to SSH in to it so no inbound ports neededEC2 Instance AMIUse latest available Amazon ECS Optimized AMI it has Docker and ECS Container Agent already installed EC2 Instance DetailsEnable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions EC2 Instance IAM RoleAttach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instanceECS TaskECS task retrieves image and runs itLambda functionUse the lambda-canary blueprint as a basis for cron job equivalentsLambda functioncron job equivalent via CloudWatch scheduled eventLambda FunctionSimple Lambda function to run task on ECSLambda function IAM roleAWS will create default IAM Roles for Lambda function need to add ecs:RunTask to run containerDemo / Q&ABlog posts Scheduled Downloads using AWS EC2 and Docker Medium http://bit.ly/1TO9a1h (me) Better Together: Amazon ECS and AWS Lambda http://amzn.to/1UkitEF (not me)Code samples https://github.com/mattmcd/PyAnalysis https://github.com/mattmcd/PCRDocker images mattmcd/pyanalysis mattmcd/pcrMe Twitter @mattmcd Email matt@metail.com or matt@matt-mcdonnell.comhttp://bit.ly/1TO9a1hhttp://amzn.to/1UkitEFhttps://github.com/mattmcd/PyAnalysishttps://github.com/mattmcd/PCRmailto:matt@metail.commailto:matt@matt-mcdonnell.com