Cloud Computing Open source cloud infrastructures Keke Chen.

  • Published on

  • View

  • Download


Cloud ComputingOpen source cloud infrastructuresKeke ChenOutlineProject 3EucalyptusOpenStackProject 3: using AWSTasks (work from nimbus17 or your own PC)Create AWS account and setup the environmentTry basic EC2 commandsStart a hadoop cluster on EC2, using the hadoopEC2 tool Read the code of hadoopEC2 to understand how to interact with EC2 in shell scriptsStarting hadoop cluster on EC2Read src/contrib/ec2/bin/ You dont need to change anything thereYou should setup your own environment variables in .profile, .login, or .bashrcAWS_ACCOUNT_ID, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEYStarting hadoop on EC2copy $HADOOP_HOME/src/contrib/ec2 to your own directory% bin/hadoop-ec2 launch-cluster your-cluster-name #ofslaves% bin/hadoop-ec2 login your-cluster-nameTest your cluster/usr/local/hadoop-*Hadoop fsck /Diagnose problems (understand the hadoop setup) the source of the EC2 toolCheck the script hadoop-ec2 and learn how to automatically launch instances Pass initialization scripts to instancesChange Hadoop configurationAnswer some questions Make your own AMIinstall a recent Hadoop version e.g., 1.0.x in the AMIHadoopEC2 provides some scripts but they need to be revised to work with the current settingExperiment with HDFS and S3Hadoop can use either HDFS or S3 as the storage for MapReduce.You need to learn the performance difference for these two optionsHow to configure Hadoop to use S3 Conduct a simple experiment to compare the performance of different storageMost popular open-source AWS equivalenceEucalyptus Started by UCSB researchers, now a companyOpenStackStarted by NASA, now an open source platformEucalyptusCompatible to AWS APIs (EC2, S3, mainly)Thus, Boto library can be used, tooA good example for understanding how AWS worksPaper The Eucalyptus Open-source Cloud-computing SystemHow VM instances are managedHow to provide virtual network (like elastic IP)How to provide data storage (like S3)A very brief description, but we can get somethingSystem DesignData centerCLC: cloud controller Walrus: storage controller similar to S3CC: cluster controllerNC: node controllerComponents: Node ControllerMake queries to discover physical resources# of coresSize of memoryAvailable disk spaceState of VM instancesPropagate the information to Cluster ControllerDescribeResourceDescribeInstancesRun/terminate instancesCLCCC NC hypervisor (Xen)Node controllerStart an instanceCopy instance image from walrus or local cacheCreate endpoint in the virtual network overlayInstruct hypervisor to boot the instance Stop an instanceInstruct hypervisor to terminate the VMTear down the virtual network endpointClean up the files associated with the instanceCluster ControllerGather/report information of NCs Through the interface provided by NCsReport the summary to CLCSchedule incoming instance run requests to specific NCsControl the virtual network overlayVirtual network overlayVM instance interconnectivity (between different nodes/networks)Not very well mentioned in Xen Connectivity, isolation and performanceAt least one of a set of VMs be exposed externallyMap the public IP to that instanceRestricted communicationVMs in the same set can talk to each otherVMs from different sets should be isolatedVirtual network overlayEach VM has a private IP; one VM in the set also has a public IPVLAN tag defines the subnet to isolate sets of VMsCluster Controller serves as the router between VM subnets- CC uses Linux iptable control traffics- Use iptable Network Address Translation (NAT) to define the map from Public IP to private IPStorage Controller (Walrus)Provide SOAP/REST interfacesCompatible with S3 you can use S3 toolsUse Walrus to stream data in/out of the cloudStore VM images (same as AMI)Root file system, kernel image, ramdisk imageNo locking for object writesConflict writes late write overwrites the earlier Provides the same tool Amazon usesGenerate AMIMaintains a cache of imagesAuthentication is applied when NC accesses imagesCloud ControllerA collection of web servicesResource services Data servicesInterface servicesCloud Controller: resource servicesReceive user requestsInteract with CCs to allocate/deallocateSystem Resource State (SRS) is maintained by querying CCsCCs will collect information from NCsFollows a transactional operationReservation, VM creation commitOr errors rollback Realizing SLAs Cloud Controller: data servicesHandles the creation, modification, interrogation, and storage of stateful system and user dataThere is a system databaseUsers can query the servicesDiscover resource info (images, clusters)Manipulate abstract parameters(keypairs, security groups, network definitions)Recall some of AWS interfacesCloud Controller: interface servicesUser-visible interfacesProgrammatic interfaces (SOAP/REST)Web interfaceHandling authenticationProvide system management tools OpenStackOpenStackOriginated at NASA, with RackspaceDriven by an open community processMultiple hypervisors: Xen, KVM, ESXi, Hyper-VFirst release: Oct 2010Components Nova Compute (equivalent to EC2)Swift object storage (S3)Image service (AMI)Networking (virtual network)Block storage (Elastic block storage)Identity Dashboard (AWS web console)-- mostly implemented with python Fastest Growing Global Open Source CommunityCOMPANIES TOTAL CONTRIBUTORSAVERAGE MONTHLY CONTRIBUTORSCODE CONTRIBUTIONS1,03623870,137231 10,149INDIVIDUAL MEMBERSCOUNTRIES 121As of July 2013Global CommunityCountries with membersDeveloper GrowthContributors per month (ohloh)1 Million+ Lines of CodeLines of code (ohloh)Ecosystem GrowthParticipating Companies********************More than 10k members! You should join!Of the total contributors, 800-900 are still active, which shows that people stick around.Even had downloads from Antarctica!***Were also deleting hundreds of thousands of lines as wellEg last 12 months:Added 2,936,791 linesRemoved 1,594,506**


View more >