16 KiB
xDS Kubernetes Interop Tests
Proxyless Security Mesh Interop Tests executed on Kubernetes.
Experimental
Work in progress. Internal APIs may and will change. Please refrain from making changes to this codebase at the moment.
Stabilization roadmap
- Replace retrying with tenacity
- Generate namespace for each test to prevent resource name conflicts and allow running tests in parallel
- Security: run server and client in separate namespaces
- Make framework.infrastructure.gcp resources first-class citizen, support simpler CRUD
- Security: manage
roles/iam.workloadIdentityUser
role grant lifecycle for dynamically-named namespaces - Restructure
framework.test_app
andframework.xds_k8s*
into a module containing xDS-interop-specific logic - Address inline TODOs in code
- Improve README.md documentation, explain helpers in bin/ folder
Installation
Requirements
- Python v3.6+
- Google Cloud SDK
kubectl
kubectl
can be installed via gcloud components install kubectl
, or system package manager: https://kubernetes.io/docs/tasks/tools/#kubectl
Getting Started
- If you haven't, initialize gcloud SDK
- Activate gcloud configuration with your project
- Enable gcloud services:
gcloud services enable \ compute.googleapis.com \ container.googleapis.com \ networksecurity.googleapis.com \ networkservices.googleapis.com \ secretmanager.googleapis.com \ trafficdirector.googleapis.com
Configure GKE cluster
This is an example outlining minimal requirements to run the baseline tests.
Update gloud sdk:
gcloud -q components update
Pre-populate environment variables for convenience. To find project id, refer to Identifying projects.
export PROJECT_ID="your-project-id"
export PROJECT_NUMBER=$(gcloud projects describe "${PROJECT_ID}" --format="value(projectNumber)")
# Compute Engine default service account
export GCE_SA="${PROJECT_NUMBER}-compute@developer.gserviceaccount.com"
# The prefix to name GCP resources used by the framework
export RESOURCE_PREFIX="xds-k8s-interop-tests"
# The zone name your cluster, f.e. xds-k8s-test-cluster
export CLUSTER_NAME="${RESOURCE_PREFIX}-cluster"
# The zone of your cluster, f.e. us-central1-a
export ZONE="us-central1-a"
# Dedicated GCP Service Account to use with workload identity.
export WORKLOAD_SA_NAME="${RESOURCE_PREFIX}"
export WORKLOAD_SA_EMAIL="${WORKLOAD_SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
Create the cluster
Minimal requirements: VPC-native cluster with Workload Identity enabled.
gcloud container clusters create "${CLUSTER_NAME}" \
--scopes=cloud-platform \
--zone="${ZONE}" \
--enable-ip-alias \
--workload-pool="${PROJECT_ID}.svc.id.goog" \
--workload-metadata=GKE_METADATA \
--tags=allow-health-checks
For security tests you also need to create CAs and configure the cluster to use those CAs as described here.
Create the firewall rule
Allow health checking mechanisms
to query the workloads health.
This step can be skipped, if the driver is executed with --ensure_firewall
.
gcloud compute firewall-rules create "${RESOURCE_PREFIX}-allow-health-checks" \
--network=default --action=allow --direction=INGRESS \
--source-ranges="35.191.0.0/16,130.211.0.0/22" \
--target-tags=allow-health-checks \
--rules=tcp:8080-8100
Setup GCP Service Account
Create dedicated GCP Service Account to use with workload identity.
gcloud iam service-accounts create "${WORKLOAD_SA_NAME}" \
--display-name="xDS K8S Interop Tests Workload Identity Service Account"
Enable the service account to access the Traffic Director API.
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--member="serviceAccount:${WORKLOAD_SA_EMAIL}" \
--role="roles/trafficdirector.client"
Allow access to images
The test framework needs read access to the client and server images and the bootstrap
generator image. You may have these images in your project but if you want to use these
from the grpc-testing project you will have to grant the necessary access to these images
using https://cloud.google.com/container-registry/docs/access-control#grant or a
gsutil command. For example, to grant access to images stored in grpc-testing
project GCR, run:
gsutil iam ch "serviceAccount:${GCE_SA}:objectViewer" gs://artifacts.grpc-testing.appspot.com/
Allow test driver to configure workload identity automatically
Test driver will automatically grant roles/iam.workloadIdentityUser
to
allow the Kubernetes service account to impersonate the dedicated GCP workload
service account (corresponds to the step 5
of Authenticating to Google Cloud).
This action requires the test framework to have iam.serviceAccounts.create
permission on the project.
If you're running test framework locally, and you have roles/owner
to your
project, you can skip this step.
If you're configuring the test framework to run on a CI: use roles/owner
account once to allow test framework to grant roles/iam.workloadIdentityUser
.
# Assuming CI is using Compute Engine default service account.
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--member="serviceAccount:${GCE_SA}" \
--role="roles/iam.serviceAccountAdmin" \
--condition-from-file=<(cat <<-END
---
title: allow_workload_identity_only
description: Restrict serviceAccountAdmin to granting role iam.workloadIdentityUser
expression: |-
api.getAttribute('iam.googleapis.com/modifiedGrantsByRole', [])
.hasOnly(['roles/iam.workloadIdentityUser'])
END
)
Configure GKE cluster access
# Unless you're using GCP VM with preconfigured Application Default Credentials, acquire them for your user
gcloud auth application-default login
# Configuring GKE cluster access for kubectl
gcloud container clusters get-credentials "${CLUSTER_NAME}" --zone "${ZONE}"
# Save generated kube context name
export KUBE_CONTEXT="$(kubectl config current-context)"
Install python dependencies
# Create python virtual environment
python3.6 -m venv venv
# Activate virtual environment
. ./venv/bin/activate
# Install requirements
pip install -r requirements.txt
# Generate protos
python -m grpc_tools.protoc --proto_path=../../../ \
--python_out=. --grpc_python_out=. \
src/proto/grpc/testing/empty.proto \
src/proto/grpc/testing/messages.proto \
src/proto/grpc/testing/test.proto
Basic usage
Local development
This test driver allows running tests locally against remote GKE clusters, right from your dev environment. You need:
- Follow installation instructions
- Authenticated
gcloud
kubectl
context (see Configure GKE cluster access)- Run tests with
--debug_use_port_forwarding
argument. The test driver will automatically start and stop port forwarding usingkubectl
subprocesses. (experimental)
Making changes to the driver
- Install additional dev packages:
pip install -r requirements-dev.txt
- Use
./bin/yapf.sh
and./bin/isort.sh
helpers to auto-format code.
Setup test configuration
There are many arguments to be passed into the test run. You can save the
arguments to a config file ("flagfile") for your development environment.
Use config/local-dev.cfg.example
as a starting point:
cp config/local-dev.cfg.example config/local-dev.cfg
If you exported environment variables in the above sections, you can template them into the local config (note this recreates the config):
envsubst < config/local-dev.cfg.example > config/local-dev.cfg
Learn more about flagfiles in abseil documentation.
Test suites
See the full list of available test suites in the tests/
folder.
xDS Baseline Tests
Test suite meant to confirm that basic xDS features work as expected. Executing it before other test suites will help to identify whether test failure related to specific features under test, or caused by unrelated infrastructure disturbances.
# Help
python -m tests.baseline_test --help
python -m tests.baseline_test --helpfull
# Run the baseline test with local-dev.cfg settings
python -m tests.baseline_test --flagfile="config/local-dev.cfg"
# Same as above, but using the helper script
./run.sh tests/baseline_test.py
xDS Security Tests
Test suite meant to verify mTLS/TLS features. Note that this requires additional environment configuration. For more details, and for the setup for the security tests, see "Setting up Traffic Director service security with proxyless gRPC" user guide.
# Run the security test with local-dev.cfg settings
python -m tests.security_test --flagfile="config/local-dev.cfg"
# Same as above, but using the helper script
./run.sh tests/security_test.py
Helper scripts
You can use interop xds-k8s bin/
scripts to configure TD, start k8s instances step-by-step, and keep them alive
for as long as you need.
- To run helper scripts using local config:
python -m bin.script_name --flagfile=config/local-dev.cfg
./run.sh bin/script_name.py
automatically appends the flagfile
- Use
--help
to see script-specific argument - Use
--helpfull
to see all available argument
Overview
# Helper tool to configure Traffic Director with different security options
python -m bin.run_td_setup --help
# Helper tools to run the test server, client (with or without security)
python -m bin.run_test_server --help
python -m bin.run_test_client --help
# Helper tool to verify different security configurations via channelz
python -m bin.run_channelz --help
./run.sh
helper
Use ./run.sh
to execute helper scripts and tests with config/local-dev.cfg
.
USAGE: ./run.sh script_path [arguments]
script_path: path to python script to execute, relative to driver root folder
arguments ...: arguments passed to program in sys.argv
ENVIRONMENT:
XDS_K8S_CONFIG: file path to the config flagfile, relative to
driver root folder. Default: config/local-dev.cfg
Will be appended as --flagfile="config_absolute_path" argument
XDS_K8S_DRIVER_VENV_DIR: the path to python virtual environment directory
Default: $XDS_K8S_DRIVER_DIR/venv
DESCRIPTION:
This tool performs the following:
1) Ensures python virtual env installed and activated
2) Exports test driver root in PYTHONPATH
3) Automatically appends --flagfile="\$XDS_K8S_CONFIG" argument
EXAMPLES:
./run.sh bin/run_td_setup.py --help
./run.sh bin/run_td_setup.py --helpfull
XDS_K8S_CONFIG=./path-to-flagfile.cfg ./run.sh bin/run_td_setup.py --resource_suffix=override-suffix
./run.sh tests/baseline_test.py
./run.sh tests/security_test.py --verbosity=1 --logger_levels=__main__:DEBUG,framework:DEBUG
./run.sh tests/security_test.py SecurityTest.test_mtls --nocheck_local_certs
Partial setups
Regular workflow
# Setup Traffic Director
./run.sh bin/run_td_setup.py
# Start test server
./run.sh bin/run_test_server.py
# Add test server to the backend service
./run.sh bin/run_td_setup.py --cmd=backends-add
# Start test client
./run.sh bin/run_test_client.py
Secure workflow
# Setup Traffic Director in mtls. See --help for all options
./run.sh bin/run_td_setup.py --security=mtls
# Start test server in a secure mode
./run.sh bin/run_test_server.py --secure
# Add test server to the backend service
./run.sh bin/run_td_setup.py --cmd=backends-add
# Start test client in a secure more --secure
./run.sh bin/run_test_client.py --secure
Sending RPCs
Start port forwarding
# Client: all services always on port 8079
kubectl port-forward deployment.apps/psm-grpc-client 8079
# Server regular mode: all grpc services on port 8080
kubectl port-forward deployment.apps/psm-grpc-server 8080
# OR
# Server secure mode: TestServiceImpl is on 8080,
kubectl port-forward deployment.apps/psm-grpc-server 8080
# everything else (channelz, healthcheck, CSDS) on 8081
kubectl port-forward deployment.apps/psm-grpc-server 8081
Send RPCs with grpccurl
# 8081 if security enabled
export SERVER_ADMIN_PORT=8080
# List server services using reflection
grpcurl --plaintext 127.0.0.1:$SERVER_ADMIN_PORT list
# List client services using reflection
grpcurl --plaintext 127.0.0.1:8079 list
# List channels via channelz
grpcurl --plaintext 127.0.0.1:$SERVER_ADMIN_PORT grpc.channelz.v1.Channelz.GetTopChannels
grpcurl --plaintext 127.0.0.1:8079 grpc.channelz.v1.Channelz.GetTopChannels
# Send GetClientStats to the client
grpcurl --plaintext -d '{"num_rpcs": 10, "timeout_sec": 30}' 127.0.0.1:8079 \
grpc.testing.LoadBalancerStatsService.GetClientStats
Cleanup
- First, make sure to stop port forwarding, if any
- Run
./bin/cleanup.sh
Partial cleanup
You can run commands below to stop/start, create/delete resources however you want.
Generally, it's better to remove resources in the opposite order of their creation.
Cleanup regular resources:
# Cleanup TD resources
./run.sh bin/run_td_setup.py --cmd=cleanup
# Stop test client
./run.sh bin/run_test_client.py --cmd=cleanup
# Stop test server, and remove the namespace
./run.sh bin/run_test_server.py --cmd=cleanup --cleanup_namespace
Cleanup regular and security-specific resources:
# Cleanup TD resources, with security
./run.sh bin/run_td_setup.py --cmd=cleanup --security=mtls
# Stop test client (secure)
./run.sh bin/run_test_client.py --cmd=cleanup --secure
# Stop test server (secure), and remove the namespace
./run.sh bin/run_test_server.py --cmd=cleanup --cleanup_namespace --secure
In addition, here's some other helpful partial cleanup commands:
# Remove all backends from the backend services
./run.sh bin/run_td_setup.py --cmd=backends-cleanup
# Stop the server, but keep the namespace
./run.sh bin/run_test_server.py --cmd=cleanup --nocleanup_namespace
Known errors
Error forwarding port
If you stopped a test with ctrl+c
, while using --debug_use_port_forwarding
,
you might see an error like this:
framework.infrastructure.k8s.PortForwardingError: Error forwarding port, unexpected output Unable to listen on port 8081: Listeners failed to create with the following errors: [unable to create listener: Error listen tcp4 127.0.0.1:8081: bind: address already in use]
Unless you're running kubectl port-forward
manually, it's likely that ctrl+c
interrupted python before it could clean up subprocesses.
You can do ps aux | grep port-forward
and then kill the processes by id,
or with killall kubectl