Hadoop Infrastructure @Uber Past , Present and Future

  • Published on
    31-Dec-2016

  • View
    214

  • Download
    2

Transcript

  • U B E R | Data

    HadoopInfrastructure@UberPast,PresentandFutureMayankBansal

  • U B E R | Data

    Transporta=onasreliableasrunningwater,everywhere,foreveryone

    UbersMission

    75+Countries 500+Ci=es

    Andgrowing

  • U B E R | Data

    HowUberworks

  • U B E R | Data

    HowUberworks

  • U B E R | Data

    HowUberworks

  • U B E R | Data

    DataDrivenDecisions

  • U B E R | Data

    DataInfraOnceUpona8me..(2014)

    Kafka Logs

    Key-Val DB

    RDBMS DBs

    S3

    Applica=ons

    ETL

    BusinessOps

    A/BExperiments

    Adhoc Analytics

    CityOps

    Vertica DataWarehouse

    Data

    Science

    EMR

  • U B E R | Data

    DataInfrastructureToday

    Kafka8 Logs

    Schemaless DB

    SOA DBs

    Service Accounts

    ETL MachineLearning

    Experimenta=on

    Data Science

    Adhoc Analytics Ops/DataScience

    HDFS

    CityOpsDataScience

    Spark|PrestoHive

  • FewTakeaways

    StrictSchemaManagement BecauseourlargestdataaudienceareSQL

    Savvy!(1000sofUberOps!) SQL=StrictSchema

    BigDataProcessingToolsUnlocked-Hive,PrestoandSpark MigrateSQLsavvyusersfromVer=catoHive

    &Presto(1000sofOps&100sofdatascien=sts&analysts)

    Sparkformoreadvancedusers-100sofdatascien=sts

  • U B E R | Data

    HadoopEvolu8on@ebay

    2014

    1XNodes1XPB

    2015 10X Nodes 4X PB Data 3000+ node 30,000+ cores 50+ PB

    2016 90X Nodes

    40X PB Data

    HadoopEvolu8on@Uber

  • U B E R | Data

    HadoopClusterU=liza=on

    Overprovisioningforthepeakloads.

    Overcapacityforan=cipa=onoffuturegrowth

  • U B E R | Data

    HadoopEvolu8on@ebay

    20140Nodes

    2015 X Nodes

    2016

    300XNodes

    MesosEvolu8on@Uber

  • U B E R | Data

    MesosClusterU=liza=on

    Overprovisioningforthepeakloads

    Overcapacityforan=cipa=onoffuturegrowth

  • U B E R | Data

    EndGoal

    Online

    Presto

  • U B E R | Data

    Whatweneed?

    GLOBALVIEWOFRESOURCES

  • U B E R | Data

    AvailableResourceManagers

  • U B E R | Data

    MesosvsYARN

    YARN MESOSSingleLevelScheduler TwoLevelSchedulerUseCgroupsforisola=on UseCgroupsforIsola=onCPU,Memoryasaresource CPU,MemoryandDiskasa

    resource

    WorkswellwithHadoopworkloads Workswellwithlongerrunningservices

    YARNsupport=mebasedreserva=ons

    Mesosdoesnothavesupportofreserva=ons

    Dominantresourcescheduling Schedulingisdonebyframeworksanddependsoncasetocasebasis

    ScalesBegerSimilarIsola=on

    Diskisbeger

    ThisisImportant

    ImpforbatchSLAsBegerforbatch

  • U B E R | Data

    Lets8edthemtogether

    YARNisgoodforHadoopMesosisgoodforLongerRunningServices

    InaNutshell

  • U B E R | Data

  • U B E R | Data

    MyriadisMesosFrameworkforApacheYARN

    MesosmanagesDataCenterresources YARNmanagesHadoopworkloads

    Myriad GetsresourcesfromMesos LaunchesNodeManagers

  • U B E R | Data

    YARNwillhandleresourceshanded

    overtoit. Mesoswillworkonrestoftheresources

    MyriadsLimita8onsSta=cResourcePar==oning

  • U B E R | Data

    YARNwillneverbeabletodooversubscrip=on. NodeManagerwillgoaway Fragmenta=onofresources

    Mesosoversubscrip=oncankillYARNtoo

    MyriadsLimita8onsResourceOverSubscrip=on

  • U B E R | Data

    NoGlobalQuotaEnforcement

    NoGlobalPriori=es

    MyriadsLimita8ons

  • U B E R | Data

    Elas=cResourceManagement

    BinPacking Stability

    LongList

    MyriadsLimita8ons

  • U B E R | Data

    UnifiedScheduler

  • U B E R | Data

    HighLevelCharacteris8cs

    GlobalQuotaManagement

    CentralSchedulingpolicies

    Oversubscrip=onforbothOnlineandBatch

    Isola=onandbinpacking

    SLAguaranteesatGlobalLevel

  • U B E R | Data

    UnifiedScheduler

  • U B E R | Data

    FewTakeaways

    Weneedoneschedulinglayeracrossallworkloads

    Par==oningresourcesarenotgood Atleastcansave30%resources

    StabilityandsimplicitywinsinProduc=on Mul=LevelofresourceManagementandschedulingwillnotbescalable

  • U B E R | Data

  • U B E R | Data

    Ques=ons?

    mabansal@uber.commayank@apache.org

  • U B E R | Data

    ThankYou!!!

Recommended

View more >