Hot off the press from AWS re:Invent in Las Vegas, this post is a summary of the Netflix OSS session we attended yesterday.

From Asgard to Zuul: How Netflix’s Proven Open Source Tools Can Help Accelerate and Scale Your Services
The Netflix OSS session dealt mainly with the tools they have released into the community and that have been developed based on their own experience and unbelieveable scale. Let’s not forget that Netflix is one of AWS’s most important clients, having embraced the Cloud 100%, dismantling their entire on-premise infrastructure.
Their tools are stored at:
http://netflix.github.io/#repo
“Sabotage” in the Cloud
Among the most interesting points that arose during the session were their Cloud “sabotage” tools:
- Chaos Monkey: In charge of killing EC2 instances.
- Chaos Gorilla: Removes an entire AZ.
- Chaos Kong: Sweeps up an entire region of the infrastructure.
Micro-service management
In addition to the sabotage services, Netflix are one of the top advocates of dividing infrastructure into micro-services. Similar to the Amazon store, this division allows them to have specific teams for specific tasks, assign managers per feature and ensure their entire platform is more resistant to failures across applications. In other words, instead of having one application with all the features, Netflix is comprised of small applications that are called on from the frontend. These small applications might be, for example, film recommendations, user information, latest episodes, etc…
To control these micro-services, they have two applications that they have also made available to the community:
- Eureka
- AWS Service registry for resilient mid-tier load balancing and failover.
- Eureka is in charge of controlling a map of the micro-services installed on AWS, used to route the calls to the API.
- https://github.com/Netflix/eureka
- Hystrix
- Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.
- In order to free the teams in charge of programming the micro-services from workloads related to high-availability, this service manages the re-tries, failovers and errors that occur in calls to the different API’s.
- This library standarizes optimization tasks (query cache and grouping) and incident management tasks, making it easier for programmers to decide what to do when something doesn’t work as expected (for example, returning a response from cache or returning a pre-defined response when the service that should have performed this task is not available).
- https://github.com/Netflix/Hystrix
And now, with Docker

And to top it all off, Netflix has also been busy adapting their set of tools to Docker, making it possible get up and running in less than 10 minutes and start experimenting on any SO capable of executing containers:
https://github.com/Netflix-Skunkworks/zerotodocker
References:
Netflix Open Source Software Center: http://netflix.github.io/#repo
Eureka source code on Github: https://github.com/Netflix/eureka
Hystrix source code on Github: https://github.com/Netflix/Hystrix
Netflix’s Dockerfiles on Github: https://github.com/Netflix-Skunkworks/zerotodocker
Chaos Monkey entry on the Netflix blog: http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
Categories
