The Zero Touch Network - Research at Google [PDF]

All network operations are automated, requiring no operator steps beyond the instantiation of intent. ○ Changes applie

7 downloads 4 Views 8MB Size

Recommend Stories


PDF At Zero: The Final Secrets to
Ask yourself: If you could have one single wish granted, what would it be? Next

History Dependent Domain Adaptation - Research at Google [PDF]
History Dependent Domain Adaptation. Allen Lavoie. Rensselaer Polytechnic [email protected]. Matthew Eric Otey. Google Pittsburgh [email protected]. Nathan Ratliff. Google Pittsburgh [email protected]. D. Sculley. Google Pittsburgh [email protected]

The Reciprocal Research Network
Seek knowledge from cradle to the grave. Prophet Muhammad (Peace be upon him)

Google matrix of the world trade network [PDF]
Mar 25, 2011 - matrix of the world trade network and analyze its properties for various trade commodities for all countries and all available years from .... gives us money matrix elements Mij (for all types of com- modities noted above). ...... http

Capacity planning for the Google backbone network
I tried to make sense of the Four Books, until love arrived, and it all became a single syllable. Yunus

Food @ Work at Google
Raise your words, not voice. It is rain that grows flowers, not thunder. Rumi

19” Zero-Client Touch Screen Monitor
There are only two mistakes one can make along the road to truth; not going all the way, and not starting.

Zero crossing voltage network detection
Do not seek to follow in the footsteps of the wise. Seek what they sought. Matsuo Basho

PDF At Zero: The Final Secrets to "Zero Limits" The Quest for Miracles Through Ho?
Ask yourself: Am I a source of inspiration for my friends and family? Next

Network Traffic Measurement Research at the U of Calgary
We must be willing to let go of the life we have planned, so as to have the life that is waiting for

Idea Transcript


The Zero Touch Network Bikash Koley For Google Technical Infrastructure CNSM 2016

Confidential + Proprietary

Confidential + Proprietary

For the past 15 years, Google has been building out the largest cloud infrastructure on the planet. Confidential + Proprietary

2

Source: Google, 2012

100 Billion

searches per month on google.com

Images by Connie Confidential + Proprietary Zhou

A Global Cloud Network

Cluster

Confidential + Proprietary

Google Backbone(s) Internet facing Backbone, B2: 70+ locations in 33 countries

Global Software Defined Inter-DC Backbone: B4

Confidential + Proprietary

Operational scale ● ●

30,000+ circuits in operation Many tens of network element roles



Dozen+ vendors



4M lines of configuration files



~30K configuration changes per month



> 8M OIDs collected every 5 minutes

Confidential + Proprietary

6

At scale stuff breaks!

Cluster

Confidential + Proprietary

The Nines and the Outage Budgets … for four 9s availability?

99.99% uptime

4 minutes per month

… for five 9s availability?

99.999% uptime

24 seconds per month Confidential + Proprietary

Velocity of Evolution Scale Management Complexity

Why is high network availability a challenge? Confidential + Proprietary

9

Capacity

Google’s Network Hardware Evolves Constantly

Watchtower

Jupiter

Firehose 1.0 Saturn

4 Post

Firehose 1.1

Time

Confidential + Proprietary

10

As does the Network Software QUIC

gRPC

Jupiter Freedome

BwE

Andromeda B4

Watchtower Google Global Cache

2014 2012 2010 2008

2006 Confidential + Proprietary

11

… driven by ever-evolving products

Confidential + Proprietary

12

Network Operation is a tradeoff

Traditional network: pick any two of the three

reliability t} ien ffic ine le,

lia

re

ala {sc

t} ien

fic

ef

ble

e, bl

ab

le,

lab

,r eli

ca ns

{u

scale

{scalable, unreliable, efficient}

efficiency

We want all three! Confidential + Proprietary

13

Lessons learned from a decade of high-availability network design Confidential + Proprietary

14

We analyzed over 100 Post-mortem reports written over a 2 year period

Confidential + Proprietary

15

What is a Post-mortem? Carefully curated description of a previously unseen failure that had significant availability impact Blame-free process

Learn from failures Confidential + Proprietary

16

Confidential + Proprietary

17

Confidential + Proprietary

18

Where do failures happen?

No one network or plane dominates Confidential + Proprietary

19

How long do the failures last? Shorter failures on B2

Durations much longer than outage budgets

Confidential + Proprietary

20

What role does network evolution play?

70% of failures happen when a management operation is in progress

Confidential + Proprietary

21

The Zero Touch Network

Reliability, efficiency, scale

{reliability, efficiency, scale} are NOT tradeoffs .. if network operation is fully intent driven

Intent-driven Operation

Evolution is inevitable: Design for it! Confidential + Proprietary

22

The Zero Touch Network ● All network operations are automated, requiring no operator steps beyond the instantiation of intent ● Changes applied to individual network elements are fully declarative, vendor-neutral, and derived by the network infrastructure from the high-level network-wide intent ● Any network changes are automatically halted and rolled-back if the network displays unintended behavior ● The infrastructure does not allow operations which violate network policies Confidential + Proprietary

The Zero Touch Network ● All network operations are automated, requiring no operator steps beyond the instantiation of intent ● Changes applied to individual network elements are fully declarative, vendor-neutral, and derived by the network infrastructure from the high-level network-wide intent ● Any network changes are automatically halted and rolled-back if the network displays unintended behavior ● The infrastructure does not allow operations which violate network policies Confidential + Proprietary

The Zero Touch Network ● All network operations are automated, requiring no operator steps beyond the instantiation of intent ● Changes applied to individual network elements are fully declarative, vendor-neutral and derived by the network infrastructure from the high-level network-wide intent ● Any network changes are automatically halted and rolled-back if the network displays unintended behavior ● The infrastructure does not allow operations which violate network policies Confidential + Proprietary

The Zero Touch Network ● All network operations are automated, requiring no operator steps beyond the instantiation of intent ● Changes applied to individual network elements are fully declarative, vendor-neutral and derived by the network infrastructure from the high-level network-wide intent ● Any network changes are automatically halted and rolled-back if the network displays unintended behavior ● The infrastructure does not allow operations which violate network policies Confidential + Proprietary

Bikash

ZTN Architecture operators “drain a link” Workflow Engine

Workflow API

Update Network model

Topology

Config

Network Management Layer configuration, commands, telemetry

Network devices/ systems

Confidential + Proprietary

Workflow Engine operators

Workflow Engine



The workflow engine executes a goal-seeking workflow graph



Workflows are expressed in a meta-language



All interesting metrics of execution logged



Workflows have the same test coverage as any software system

Confidential + Proprietary

Network intent ● operators

intent-based network management

“drain a link” Workflow Engine

The workflow engine interacts with the

infrastructure over transactional APIs

Workflow API



Workflow intents are expressed at the network-level, as changes to ○

Topology



Config



Functional calls

Confidential + Proprietary

Network Models ●

Update Network model

OpenConfig (www.openconfig.net) for vendor-neutral configuration model

config / topology models

base model

Topology

Config



YANG for data modeling, gRPC as transport



Both configuration and op-state models



BGP, MPLS, ISIS, L2, Optical-transport, ACL,

extended model

policy...

● local modifications

X

vendor modifications

“Unified Network Model” for topology ○

Protocol Buffer based Google internal schema



Describes all layer-0/1/2/3 abstractions Confidential + Proprietary

Network Management Services ●

Compose full config (vendor-neutral and vendor-specific) from topology/config intent update

Topology

Config



Provides secure transport of full config to network elements (OpenConfig+gRPC)

Network Management Layer configuration, commands



Enforce Operational Policies ○

Rate limiting



Blast radius containment



Minimum survivable topology Confidential + Proprietary

Streaming Telemetry network state changes observed by analyzing comprehensive time-series data stream

● Common schema for operational state data in OpenConfig ● stream data continuously -with incremental updates ● Efficient, secure transport protocol, gRPC

Confidential + Proprietary

Workflow Safety ●

Ability to automatically check the safety of operations



Ability to repeatedly validate the network state against the stated intent



Ability to recognize “bad” network behavior



Ability to roll back to the original state

Confidential + Proprietary

Do not treat a change to the network as an exceptional event Lessons learned from a decade of high-availability network design Confidential + Proprietary

34

Changes are common

Confidential + Proprietary

Changes are common ↓ Make it safe to evolve the network daily

Confidential + Proprietary

Changes are common ↓ Make it safe to evolve the network daily ↓ Scale just-in-time, scale often

Confidential + Proprietary

Changes are common ↓ Make it safe to evolve the network daily ↓ Scale just-in-time, scale often ↓ Evolve into a Zero Touch Network Confidential + Proprietary

References ● ● ● ● ● ●

B4: Experience With a Globally Deployed Software Defined WAN [sigcomm 2013] Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network [Sigcomm 2015] Evolve or Die - High-Availability Design Principles Drawn from Google’s Network Infrastructure [sigcomm 2016] Andromeda: Google’s cloud networking stack OpenConfig : http://www.openconfig.net gRPC: http://www.grpc.io

Confidential + Proprietary

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.