Istio service mesh, a start to finish tutorial with Side Car architecture and an analysis + comparison of the Ambient mesh architecture.
Istio and the area of service meshes in general have seen several new innovations in the past year, introduction of an Ambient mesh being the latest. Service Mesh has quickly become a critical component of all K8s deployments. If this is an area that interests you, here is an end-to-end tutorial on how to get Istio up and running in your K8s cluster.
(advanced users interested in Ambient mesh can skip ahead to the analysis of how the introduction of Ambient Mesh evolves this area further.)
Step 1 : Verify your cluster
Using CMD Line
This tutorial was written for the following versions of K8s$ kubectl version -short
a) Verify the health of your Kubernetes Cluster:
kubectl get — raw ‘/healthz?verbose’
b) Verify the status of your nodes$ kubectl get nodes
c) Verify that Control Plane and CoreDNS are up and running$ kubectl cluster-info;
d) Confirm the version of Helm used
helm version — short
Using UI
Alternatively, you could use Kubernetes Dashboard to verify all of the above. As with all Kubernetes clusters exposed to the public, always remember to lock administrative access, including
access to the Dashboard.
If you don’t, this could happen - “The Tesla infection shows not only the brazenness of cryptojackers, but also how their attacks have become more subtle and sophisticated.”
Step 2 : Install Istio
a) Download Istio from here: https://github.com/istio/istio/tree/experimental-ambient
or
https://gcsweb.istio.io/gcs/istio-build/dev/0.0.0-ambient.191fe680b52c1754ee72a06b3e0d3f9d116f2e82
export ISTIO_VERSION=<Experimental Version>
This ISTIO_VERSION will be referenced by the installer.
This tutorial uses Istio Operator with the Helm package manager for Kubernetes, set the path to your download
export PATH=”$PATH:/root/istio-${ISTIO_VERSION}/bin”
b) Verify the version of the command-line tool:
istioctl version
The version will appear, and the message no running Istio pods in “istio-system” is expected since nothing has started.
c) Initialize Istio on this Kubernetes cluster. This will start the operator and in turn, the operator will manage the installation and configuration of Istio for this cluster:
istioctl install — set profile=demo -y
Istio profiles
There are a few profiles to choose from based on the list of Istio features you want to enable:
Profiles provide customization of the Istio control plane and data plane.
You can start with one of Istio’s built-in configuration profiles and then further customize the configuration for your specific needs. Demo configuration is designed to showcase Istio functionality with modest resource requirements.
Reference:
https://istio.io/latest/docs/setup/additional-setup/config-profiles/
d) You can verify Istio is ready after a few minutes:
kubectl get deployments,services -n istio-system
- deployment.apps/istiod : is the primary control plane component, Istio daemon, which has key components such as the Pilot, Citadel, Galley.
- Pilot : provides traffic management
- Galley : provides configuration management services.
- Citadel : certificate authority.
Architectural overview
This is good time in the blog to get to the tasty bits about Ambient Mesh.
Here is a side by side comparison of what the architecture of Istio looks like with side-cars vs the Ambient mesh architecture.
An Istio service mesh is logically split into a Data plane and a Control plane.
Data plane
- A set of intelligent proxies deployed as sidecars or as a shared zero-trust tunnel.
- They mediate and control all network communication between microservices.
- Collect and report telemetry on all mesh traffic.
Control plane
- Manages and configures the proxies to route traffic.
- It was deemed that a single istiod process is a better design than breaking each primary component into separate Deployments and Pods.
Before diving deeper, a quick refresher on the 7-layer network stack
Ambient mesh : an analysis of Zero-trust tunnel and Waypoint proxy
ZTunnel
- Helps securely connect and authenticate elements within the mesh. The networking stack on the node redirects all traffic of participating workloads through the local ztunnel agent.
- Allows operators to enable, disable, scale, and upgrade the data plane without disturbing applications, enforcing separation of concerns between the data plane and the applications. The ztunnel performs no L7 processing on workload traffic, making it significantly leaner than sidecars.
- Compared to side-cars, processing L4 only significantly reduces the complexity and associated resource costs, making it ideal as a shared resource.
Waypoint proxy
- Handles L7 processing for workloads.
- Are namespace specific, 1:1.
- Control plane configures the ztunnels in the cluster to pass all traffic that requires L7 processing through the waypoint proxy.
*Performance : they are just regular pods that can be auto-scaled like any other Kubernetes deployment, this will yield significant resource savings for users, as the waypoint proxies can be auto-scaled to fit the real time traffic demand of the namespaces they serve, not the maximum worst-case load operators expect.
Why move away from side-cars to ztunnels and waypoint proxies?
Side-cars tightly couple Applications to the Istio data plane.
- Installing or upgrading sidecars requires restarting the application pods as they are injected into applications by modifying their Kubernetes pod spec and redirecting traffic within the pod
- Sidecar proxies have 1:1 relation with the associated workload, so resources must be provisioned for worst case usage of each individual pod.
- Performance, L7, Application layer functions (circuit breaking, HTTP routing, load balancing, rate limiting, timeouts, advanced auth policies) are computationally expensive, making side-cars a take-it or leave-it proposition. Applications needing low resource functions like TCP routing, TLS with simple auth policies or TCP access logging and tracing will have to take the performance hit as they are coupled with a L7 processor even if they need none of the L7 functions.
Side-car concerns addressed in an Ambient Mesh architecture
- Freedom from the tyranny of L7 functions!
- Waypoint proxies can be scaled independently per namespace basis. Tenants not needing L7 functions can be spared the cost and the performance hit.
- L4 features provided by the ztunnel need a much smaller CPU and memory footprint.
- The separation/simplification allows for ztunnel to be replaced seamlessly by other secure tunnel implementations. - Security, separating L4 and L7 functions reduces the vulnerable surface area.
- Colocated sidecars and workloads = a vulnerability in one compromises the other.
With a separation of concerns in the ambient mesh model, even if an application is compromised, the ztunnels and waypoint proxies can still enforce strict security policy on the compromised application’s traffic.
Given how battle-tested some of the proxies are(like Envoy), workloads are often the source of security issues. In this model workloads can be individually isolated. - Even though ztunnel is shared, it only has access to the keys of the workloads currently on the node it’s running along with a limited L4 only attack surface area.
Hoppiness(Network latency) vs L7 functions
As long as you don’t have custom infrastructure(synonym : a mac mini farm in your parents basement)/use a standard cloud provider, the latency addition of an extra hop should be dwarfed by the time saved by optimized and centralized L7 processing or skipping L7 functions altogether.
Some Basic Measurements :
- Payload : Super-Sized Requests from https://github.com/solo-io/gloo-gateway-use-cases
- Deployment :
GKE Standard Node Pool
8 x Dev, Region: Oregon, Provisioning model: Regular
Instance type: n1-highmem-96
Operating System / Software: Free
Persistent Disk (Accompanying)
8 x boot disk
Product accompanying: GKE Standard
Zonal balanced PD: 10 GiB - Container : https://github.com/mock-server/mockserver/blob/master/mockserver-examples/docker_compose_examples/docker_compose_without_server_port/docker-compose.yml
Simply running 100,000 vanilla super-sized payloads over 10 mins (using a mock server) and measuring latency yielded :
1) L7 Filtering only
- With Side-cars : [Avg : 150 ms][P99 : 400 ms][P99.5 : 1400 ms]
- With Ambient mesh :[Avg : 120 ms][P99 : 340 ms][P99.5 : 991ms]
2) L4 Filtering only
- With Side-cars : [Avg : 160 ms][P99 : 390 ms][P99.5 : 1100 ms]
- With Ambient mesh :[Avg : 50 ms][P99 : 190 ms][P99.5 : 1010 ms]
Even a simple scenario like this demonstrates an improvement of around 30% on overall latency.
Coming soon : Throughput analysis + Throughput vs Latency graph for both.(as soon as my monthly GCP credits renew)
Install continued :
e) Istio comes prepackaged with a command line tool called istioctl which can be used to verify installtion
istioctl manifest generate — set profile=demo > $HOME/istio-generated-manifest.yaml
istioctl verify-install -f $HOME/istio-generated-manifest.yaml
If everything is healthy, you should see a listing of checked successfully messages for the Istio components.
f) A collection of supplemental integrations is offered for the Istio control plane, lets install them.
(use the link below to get respective yamls)
Extract SemVer SEMVER_REGEX=’[⁰-9]*\([0–9]*\)[.]\([0–9]*\)[.]\([0–9]*\)\([0–9A-Za-z-]*\)’
INTEGRATIONS_VERSION=$(echo $ISTIO_VERSION | sed -e “s#$SEMVER_REGEX#\1#”).$(echo $ISTIO_VERSION | sed -e “s#$SEMVER_REGEX#\2#”) && echo $INTEGRATIONS_VERSION
Integration #1, Prometheus : open source monitoring system and time-series database for metrics, to record metrics that track the health of Istio and applications within the service mesh.
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-$INTEGRATIONS_VERSION/samples/addons/prometheus.yaml
Integration #2, Grafana : integrates well with time-series databases such as Prometheus and offers a language for creating custom dashboards for meaningful views into your metrics. Istio creates some optimized dashboards for viewing the mesh behaviors out of the box:
- Mesh : An overview of all services in the mesh.
- Service Detailed breakdown of metrics for a service.
- Workload Detailed breakdown of metrics for a workload.
- Performance Monitors the resource usage of the mesh.
- Control Plane Monitors the health and performance of the control plane.kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-$INTEGRATIONS_VERSION/samples/addons/grafana.yaml
Integration #3, Jaeger: Distributed tracking and OpenTelemetry. Jaeger is an open source end to end distributed tracing system, allowing users to monitor and troubleshoot transactions in complex distributed systems. Jaeger implements the OpenTracing specification.
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-$INTEGRATIONS_VERSION/samples/addons/jaeger.yaml
Integration #4, Kiali : an observability console for Istio with service mesh configuration and validation capabilities. It helps you understand the structure and health of your service mesh by monitoring traffic flow to infer the topology and report errors. It integrates with Grafana and Jaeger.
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-$INTEGRATIONS_VERSION/samples/addons/kiali.yaml
g) Injecting side-cars into namespace
kubectl label namespace default istio-injection=enabled
kubectl apply -f istio-$ISTIO_VERSION/<path to YAML>/<application>.yaml
Application YAML file should contain manifests for
1) Deployments
2) Pods
3) Services services that define the application.
Note : Istio is independent from your application configuration and logic.
h) Setup Service Access
An Istio Gateway is used to define the ingress into the mesh, define the ingress gateway for the application:kubectl apply -f istio-$ISTIO_VERSION/<path to yaml>/gateway.yaml
Confirm the gateway has been created:
kubectl get gateway
Confirm the app is now accessible through the mesh Gateway:
curl http://${GATEWAY_URL}/apppage
g) Setup Public access : Ingress Service Connection to the Mesh Gateway
istio-ingressgateway is a Pod with a Service of the type LoadBalancer that accepts this traffic.
kubectl get service istio-ingressgateway -n istio-system
Until a bridge is defined between the lab’s load balancer and ingress gateway service, the external ip will be stuck at pending. To create this bridge use :
kubectl patch service -n istio-system istio-ingressgateway -p ‘{“spec”: {“type”: “LoadBalancer”, “externalIPs”:[“172.30.12.5”]}}’
Verify the pending status has changed to the host IP:
kubectl get service istio-ingressgateway -n istio-system
Now Istio installation is complete.
L7 Demonstration, Traffic Management
Traffic can be controlled and routed based on information in the HTTP request headers. Routing decisions can be made based on the presence of data such as user agent strings, key/values, IP addresses, or cookies.
#1 Showing Don Norman a refined design for review using end-user header
This will send all traffic from Don Norman (don-norman) to design:v2 but all other users to design:v1 creating a safe space for a design review.kubectl apply -f istio-$ISTIO_VERSION/<path to yaml>/virtual-service-design-v2.yaml
#2 Splitting Releases
Speaking of Don Norman, one of his favorite areas of A/B testing, the ability to split traffic for testing and rolling out changes is important. We can create a virtual service and apply it like so :
kubectl apply -f istio-$ISTIO_VERSION/<path to yaml>/virtual-service-design-50-v3.yaml
- The weighting is not round robin, multiple requests may go to the same service, but over multiple calls the statistics eventually work out.
- #1 will take precedence over #2.
Inspection and Monitoring
Isoctl
List all the envoy sidecars run using this:istioctl proxy-status
Investigate the mesh rules and configurations:istioctl analyze — help
Each integration deserves its own blog, here is a quick preview of their features.
Grafana Dashboard, out of the box returns the total number of requests currently being processed, along with the number of errors and the response time of each call.
Kiali, for monitoring and alerting on Istio health and metrics.
Jaeger Tracing Dashboard, provides tracing information for each HTTP request. It shows which calls are made and where the time was spent within each request. Click on a span to view the details on an individual request and the HTTP calls made. This is an excellent way to identify issues and potential performance bottlenecks.
If you’re using Ambient Mesh in your architecture, and have thoughts/ideas /throughput analysis, I’d love to discuss. If the tutorial helped you get familiar with Istio, click the Buy Me a Coffee button :)
References and credits :
1) https://www.solo.io/products/ambient-mesh/
2) https://learning.oreilly.com/library/view/istio-up-and/9781492043775/
3) https://electricala2z.com/cloud-computing/osi-model-layers-7-layers-osi-model/
4) https://kiali.io/
5) https://www.jaegertracing.io/
6) https://grafana.com/
7) https://prometheus.io/
8) Ambient mesh product launch