Step by step guide for trying the Amlen operator

Version 1.1.0 introduces an operator for provisioning and managing Amlen. But what exactly is that and what does it do? Operators can be thought of as an AI helper for kubernetes, you tell it what you want and it will go of and create everything you need, depending on the sophistication of the operator you may need to give it more or less information with it making intelligent choices for you.

You may hear talk of the maturity of an operator, these correspond roughly to capabilities and there are a couple of different definitions. It all starts from the basic it will do an install all the way up to it will dynamically scale as usage changes. The Amlen operator is relatively immature, it will do the installs and help with upgrades but scaling is down to the user.

All of this sounds very exciting but what if you have never used an operator, or maybe haven’t used Kubernetes before? Hopefully, this is where this step by step guide should help. It’s not going to help you create your production system, that needs to be tailored to your specific needs, but this will get you an environment to play with.

CodeReady Containers

For this guide we will be using CodeReady Containers, a free openshift development environment designed to run on a single server. Most of the instructions are going to be applicable to any kubernetes environment but there will be some openshift specific information (particularly towards the end).

The first step is to install CodeReady Containers, and we aren’t going to go into detail about this because there is already a wealth of documentation about this, so head over to https://crc.dev/crc/ and get started.

Starting CodeReady Containers the first time does take a while, but for this guide we will be deploying from the source code so whilst waiting for CodeReady Containers you can clone the repository which is available from https://github.com/eclipse/amlen.

With the source code downloaded you want to go into the operator subdirectory.

Assuming CodeReady Containers has started you can check the status via crc status and will see something like:

CRC VM: Running
OpenShift: Running (v4.10.3)
Podman:
Disk Usage: 16.29GB of 32.74GB (Inside the CRC VM)
Cache Usage: 16.83GB
Cache Directory: ~/.crc/cache

Unfortunately, that isn’t going to be enough disk space for a sensible Amlen deployment so you need to increase the size. Which means stopping and restarting. How you increase the disk size is going to depend a little on how you installed CodeReady Containers and operating system and you will have to do some googling if the following doesn’t work. I run:

crc stop
qemu-img resize ~/.crc/machines/crc/crc.qcow2 +120G
crc start

(You may need to run the resize command as root and give the full path)

Fortunately, restarting should be a lot quicker than the original start although can still be around 5 minutes. At this point when you check crc status it should look a little like:

CRC VM: Running
OpenShift: Running (v4.10.3)
Podman:
Disk Usage: 16.09GB of 161.6GB (Inside the CRC VM)
Cache Usage: 16.83GB
Cache Directory: ~/.crc/cache

The first thing we need to do login as admin and configure the oc command line interface. When you start CodeReady Containers it gives you the commands for developer login, to find the admin login you will want to do crc console --credentials and then you should be good to go:

eval $(crc oc-env)
oc login -u kubeadmin -p xxxxx-xxxxx-xxxxx-xxxxx https://api.crc.testing:6443

The password shouldn’t change when you stop and restart CodeReady Containers but will if you delete the instance and create a new one.

The next step is to create a namespace to which we will deploy. We are going to call this amlen so we simply call:

oc new-project amlen

At this point you are pretty much ready to go, however you may not have kubectl installed and the Makefile uses kubectl and so will the rest of this guide. oc does everything that kubectl does, so you could just change the Makefile to use oc, or you could symlink oc to kubectl, or you could install kubectl. Symlink seems the simplest so if kubectl isn’t found, just do:

ln -s ~/.crc/bin/oc/oc ~/.crc/bin/oc/kubectl

Deploying the operator

By now we should have the source code, be in the operator subdirectory of the source code, have a namespace created called amlen, and the command kubectl should work.

You need to work out what level of the operator to install. There are a few choices, but to keep things simple we will pick the operator for the main github branch.

export IMG=quay.io/amlen/operator:main

And to deploy that operator we just use:

make deploy

This will create a namespace called amlen-system with an operator inside it. To check the status of the operator you can use:

kubectl get pod-n amlen-system

which will hopefully show something like:

NAME READY STATUS RESTARTS AGE
amlen-controller-manager-78cbfc5c66-5vgjz 2/2 Running 0 78s

If it shows as container creating then wait a bit and it will hopefully move to running on it’s own. The most likely reason for the operator not running is that the image name or version was not correct so you can check that via:

oc describe pod amlen-controller-manager-78cbfc5c66-5vgjz -n amlen-system

and then you should be able to see the Image in the manager section:

...
manager:
Container ID: cri-o://c94c35d7b81303c4fb6fb98536263a5ca3106bc53b27249fbf532d1a7456288f
Image: quay.io/amlen/operator:main
Image ID: quay.io/amlen/operator@sha256:612faf20b6d97170d02151366bb4f2c8aaa4db2cca48428e5233b258963095cd
...

If you do need to change the image then use make undeploy change IMG and use make deploy to deploy with the new image.

Once you have a running operator the next thing is a certificate manager. We will use cert-manager which is an opensource certificate manager that is widely used. Adding it is very simple just use:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.yaml

This operator was developed using 1.8.0 of cert-manager but 1.8.2 is now available as well as a beta of 1.9.0 both of which should be compatible as there are no breaking changes mentioned. cert-manager can take a couple of minutes to start up, you can check the status by looking at the pods in the cert-manager namespace via:

kubectl get pod -n cert-manager

When everything is running you should see three pods as shown here:

NAME READY STATUS RESTARTS AGE
cert-manager-64d9bc8b74-24qhw 1/1 Running 0 92s
cert-manager-cainjector-6db6b64d5f-rbhsd 1/1 Running 0 92s
cert-manager-webhook-6c9dd55dc8-rvjdk 1/1 Running 0 92s

If cert-manager is not ready when you try to apply you ClusterIssuer you will typically see errors about the webhook not being available or not responding.

At this point you are ready to create your deployment. We will walk through the sample deployment which is done using the files in config/samples inside the operator subdirectory.

Start with config/samples/selfsigned.yaml this will create a ClusterIssuer using cert-manager:

kubectl apply -f config/samples/selfsigned.yaml

ClusterIssuers (as the name may hint) are not namespace specific. To check it is ready you can use:

kubectl get ClusterIssuer

Which will return:

NAME READY AGE
amlen-internal-issuer True 41s

As they are cluster wide you can have deployments in multiple namespaces using the same issuer. If you want 2 separate deployments in different namespaces using different issuers then you will need to alter the name in selfsigned.yaml and in amlen_v1_amlen.yaml

The sample used in this guide will create two HA pairs (4 servers in total) each HA pair by default will use create their own randomly generated admin password. But here we will change it so the first pair will use passw0rd.

kubectl apply -f config/samples/simple_password.yaml

This creates a secret in the amlen namespace (assuming you are in the amlen namespace/project which can be confirmed via oc status in openshift). If you are not in the amlen namespace then you will need to specify the namespace using -n amlen you can check the secret is correct via:

kubectl get secret amlen-0-adminpassword -o yaml

This will show you the base64 encoded password:

...
data:
password: cGFzc3cwcmQ=
...

The sample will use a custom configuration for Amlen called amlen-config so we need to apply that next:

kubectl apply -f config/samples/config.yaml

Again this needs to be in the amlen namespace (or the namespace that you want to deploy Amlen into) and can be viewed via:

kubectl get configmap amlen-config -o yaml

As it is a config map rather than a secret the data is shown in plane text. The difference between this configuration and the default is it requiers the devices to use certificates for authentication, we will look at how to change the config once we have a running system.

The sample also deploys an LDAP server which requires an ldap-config configmap to exist in the namespace. This is achieved by:

kubectl apply -f config/samples/config-ldap.yaml

At this point everything should be ready to deploy the actual Amlen systems.

kubectl apply -f config/samples/amlen_v1_amlen.yaml

This creates an amlen custom resource so can be viewed via:

kubectl get amlen

which should return:

NAME AGE
amlen-sample 49s

The sample creates 4 amlen servers called amlen-0-0 and amlen-0-1 which make the HA pair amlen-0, and amlen-1-0 and amlen-1-1 in the HA pair amlen-1. It will also create the LDAP server. Creating them all can take up to 10 minutes you will see the servers coming up one by one. After the second server in a pair is running, the first server will restart as part of the configuration process. You can check the progress via:

kubectl get pod

If all goes well you should eventually see:

NAME READY STATUS RESTARTS AGE
amlen-0-0 1/1 Running 1 (5m5s ago) 7m35s
amlen-0-1 1/1 Running 0 5m52s
amlen-1-0 1/1 Running 1 (2m17s ago) 3m48s
amlen-1-1 1/1 Running 0 3m5s
ldap-0 1/1 Running 0 59s

As you can see we have the 4 amlen servers running, the first server in each pair has restarted once and the ldap server is running.

If something goes wrong it may not be entirely obvious where to look. If a pod is in pending state then it could be that you have run out of a resource (disk space is the most common), or there are various states to do with being unable to get the image. If you have a pod that is not ready then you can look at the events, for example for amlen-0-0 issue:

kubectl describe pod amlen-0-0

At the bottom of the output is the events which will hopefully give some indication.

However, if the problem is not with the pods then you may need to look at the operator logs. Start by finding the operator’s pod:

kubectl get pod -n amlen-system

Using the name returned you can view the logs via:

kubectl logs amlen-controller-manager-78cbfc5c66-5vgjz -n amlen-system

Unfortunately the logs are not very readable in a lot of cases. The general approach is to look for red text and see what TASK was running, in the following we can see that it failed when running the “get config map” task so the configmap has been called the wrong name or is in the wrong namespace:

TASK [get config map] ****
fatal: [localhost]: FAILED! => {"api_found": true, "changed": false, "failed_when_result": true, "resources": []}

Exposure

Having a amlen system running in Kubernetes is great, but you probably want to be able to interact with it from the outside world. In kubernetes communication is typically done via services and there are a number of these produced by the operator. They can viewed via:

kubectl get service

And assuming you’ve used the samples you should have something like:

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
amlen ClusterIP 10.217.4.2 <none> 9089/TCP 34m
amlen-0 ClusterIP None <none> <none> 32m
amlen-0-0-admin ClusterIP 10.217.4.107 <none> 9089/TCP 32m
amlen-0-1-admin ClusterIP 10.217.5.174 <none> 9089/TCP 32m
amlen-0-mqtt LoadBalancer 10.217.4.218 <pending> 8883:31069/TCP 34m
amlen-1 ClusterIP None <none> <none> 29m
amlen-1-0-admin ClusterIP 10.217.5.146 <none> 9089/TCP 29m
amlen-1-1-admin ClusterIP 10.217.5.10 <none> 9089/TCP 29m
amlen-1-mqtt LoadBalancer 10.217.4.226 <none> 8883:30927/TCP 30m
ldap-service ClusterIP None <none> 1389/TCP,1636/TCP 28m

The most important ones from our point of view are amlen-0-mqtt and amlen-1-mqtt they are the connections that the mqtt devices will be using. So we need to expose them, but how you expose them depends on the kubernetes environment. Using CodeReady Containers means we have access to the openshift expose command which fortunately makes it rather simple:

oc expose service amlen-0-mqtt
oc expose service amlen-1-mqtt

Now to connect we need to know the host and port, we get the host by using:

kubectl get route

which will show us:

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
amlen-0-mqtt amlen-0-mqtt-amlen.apps-crc.testing amlen-0-mqtt mqtt-port None
amlen-1-mqtt amlen-1-mqtt-amlen.apps-crc.testing amlen-1-mqtt mqtt-port None

Unfortunately the output includes the port name rather than the port number so to look at the port you need to look at the service which uses the format <internal_port_number>:<external_port_number>/<type> so when it shows amlen-0-mqtt port is: 8883:31069/TCP a device will need to connect to port 31069

The other services you may want to expose are all of the admin services for each node, this will allow you to use the rest API to configure the system without having to exec onto a pod.

Credentials

We specified the admin password for amlen-0 but for amlen-1 a random password was generated. you can get this using:

kubectl get secret amlen-1-adminpassword -o yaml

The password will be base 64 encoded so you can just decode it. Alternatively as you have admin rights u can use exec to print the password:

kubectl exec -it amlen-1-0 -- cat /secrets/adminpassword/password

The sample requires a TLS connection so that means you need to get hold of the certificates. The simplest is to get them from the amlen pods, as secrets are stored via symlinks and kubectl cp doesn’t cope with symlinks the easiest way is to cat the contents as we did with the password:

kubectl exec -it amlen-0-0 -- cat /secrets/internal_certs/ca.crt > ca.crt
kubectl exec -it amlen-0-0 -- cat /secrets/internal_certs/tls.crt > tls.crt
kubectl exec -it amlen-0-0 -- cat /secrets/internal_certs/tls.key > tls.key

For a quick check to see that it’s working we can use the ansible module created for use in unit tests:

python3 library/subscribeTestBasic.py -H amlen-0-mqtt-amlen-apps-crc.testing -c client1 -P 31069 -s --insecure -K tls.key -C tls.crt -t cert_test

It does require the paho.mqtt and requests modules, both of which can be installed using pip. If it’s successful you should get:

Test completed successfully

The test program is written for use in automated testing, and error handling is very limited.

Slightly more detailed logging can be found in /tmp/mqtt_sniff.log and you should see something like:

2022-07-08 14:55:55,331 INFO --------------------------------
2022-07-08 14:55:55,331 INFO host ………… amlen-0-mqtt-amlen.apps-crc.testing
2022-07-08 14:55:55,331 INFO topic ……….. cert_test
2022-07-08 14:55:55,331 INFO clientid …….. client1
2022-07-08 14:55:55,331 INFO port ………… 31069
2022-07-08 14:55:55,331 INFO insecure …….. True
2022-07-08 14:55:55,331 INFO use-tls ……… True
2022-07-08 14:56:06,269 INFO Message Published
2022-07-08 14:56:06,372 INFO Message received on cert_test:b'hello'

But what about LDAP?

We have told the operator to deploy LDAP so why are we messing around with certificates? Unfortunately, our custom config doesn’t allow password authentication (or was this merely a rouse to include instructions on how to change the config?). So the first step is changing the config.

The sample config is immutable, this means that the amlen custom resource needs to be changed to use a different config, which in turn means that the operator will see the config change without having to watch the configmaps.

So the first step is to define a new config, this is fairly easy. Copy the existing config.yaml from config/samples. At the top is the name so change it to amlen-config1 then near the bottom is the SecurityProfile which contains the IoTSecurityProfile, change UsePasswordAuthentication to true and UseClientCertificate to false. This is the security profile used by the MQTT endpoint.

You can now apply the config, if you have forgotten to rename the config the apply will fail due to the existing configmap being immutable. Assuming you’ve copied the file into config/samples/config1.yaml then you simply run:

kubectl apply -f config/samples/config1.yaml

Then you need to change the amlen custom resource to use the new config. To do this we will edit it using:

kubectl edit amlen amlen-sample

It will open the definition in a text editor and you just need to go down to the spec and update the config field so it should end up something like:

spec:  ha:
  cert_issuer:
   name: "amlen-selfsigned-issuer"
  enabled: true
 device_cert_issuer:
  mode: "automatic"
   name: "amlen-selfsigned-issuer"
 config: "amlen-config1"
 ldap:
  cert_issuer:
   name: "amlen-selfsigned-issuer"
  enabled: true

The operator will see that the amlen-sample has changed and will apply the new config. Working out when that has been done is not necessarily easy. You can check the logs of the operator and hopefully it will end with:

--------------------------- Ansible Task StdOut -------------------------------
TASK [amlen : configure ldap]
task path: /opt/ansible/roles/amlen/tasks/configure_ldap.yml:27
{"level":"info","ts":1657291151.6938274,"logger":"logging_event_handler","msg":"[playbook task start]","name":"amlen-sample","namespace":"amlen","gvk":"amlen.com/v1alpha1, Kind=Amlen","event_type":"playbook_on_task_start","job":"2062086460060406162","EventData.Name":"amlen : configure ldap"}
{"level":"info","ts":1657291164.1741133,"logger":"runner","msg":"Ansible-runner exited successfully","job":"2062086460060406162","name":"amlen-sample","namespace":"amlen"}
----- Ansible Task Status Event StdOut (amlen.com/v1alpha1, Kind=Amlen, amlen-sample/amlen) -----
PLAY RECAP *
localhost : ok=54 changed=0 unreachable=0 failed=0 skipped=9 rescued=0 ignored=0

The last task it runs in configure ldap, so you are looking for a timestamp (ts) that is after the point that you changed the config. If the log doesn’t end with the PLAY RECAP then it is probably still running so check back in a few seconds to check it is progressing.

Another way of checking is to exec into the operator pod using the pod name found earlier:

oc exec -it amlen-controller-manager-78cbfc5c66-5vgjz -n amlen-system -- bash

The configure task creates a log /tmp/configure.log near the bottom of which you should find something like:

INFO:amlen-configurator:Attempting to connect to https://amlen-1-0.amlen-1.amlen.svc.cluster.local:9089/ima/v1/configuration/ to post {'MessageHub': {'IoTMessageHub': {'Description': 'Amlen Message Hub'}}, 'ConnectionPolicy': {'IoTConnectionPolicy': {'Protocol': 'MQTT', 'AllowDurable': True, 'AllowPersistentMessages': True, 'Description': 'IoT Connection Policy', 'MaxSessionExpiryInterval': 3888000}}, 'TopicPolicy': {'IoTMessagingPolicy': {'Topic': '', 'ActionList': 'Publish,Subscribe', 'Protocol': 'MQTT', 'MaxMessages': 5000, 'MaxMessagesBehavior': 'DiscardOldMessages', 'MaxMessageTimeToLive': 'unlimited'}}, 'SubscriptionPolicy': {'IoTSubscriptionPolicy': {'Subscription': '', 'ActionList': 'Receive,Control', 'Protocol': 'MQTT', 'MaxMessages': 100000, 'MaxMessagesBehavior': 'DiscardOldMessages'}}, 'TrustedCertificate': [{'TrustedCertificate': 'ca.crt', 'SecurityProfileName': 'IoTSecurityProfile', 'Overwrite': True}], 'CertificateProfile': {'IoTCertificate': {'Certificate': 'tls.crt', 'Key': 'tls.key', 'Overwrite': True}}, 'SecurityProfile': {'IoTSecurityProfile': {'MinimumProtocolMethod': 'TLSv1.2', 'UseClientCertificate': False, 'Ciphers': 'Fast', 'CertificateProfile': 'IoTCertificate', 'UseClientCipher': False, 'UsePasswordAuthentication': True, 'TLSEnabled': True}}, 'Endpoint': {'IoTSecureEndpoint0': {'Enabled': True, 'Port': 8883, 'Protocol': 'MQTT', 'SecurityProfile': 'IoTSecurityProfile', 'MessageHub': 'IoTMessageHub', 'Interface': 'all', 'MaxMessageSize': '1024KB', 'ConnectionPolicies': 'IoTConnectionPolicy', 'TopicPolicies': 'IoTMessagingPolicy', 'SubscriptionPolicies': 'IoTSubscriptionPolicy'}}, 'TraceBackupCount': 100}
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): amlen-1-0.amlen-1.amlen.svc.cluster.local:9089
DEBUG:urllib3.connectionpool:https://amlen-1-0.amlen-1.amlen.svc.cluster.local:9089 "POST /ima/v1/configuration/ HTTP/1.1" 200 113
INFO:amlen-configurator:* Configuration for amlen-1-0.amlen-1.amlen.svc.cluster.local updated successfully

That will hopefully show the config that you wanted with UsePasswordAuthentication as True and UseClientCertificate as False.

Adding LDAP users

In the README.md in the operator subdirectory is an example for adding 2 users to LDAP which is:

dn: cn=msgUser1,ou=users,dc=amleninternal,dc=auth
changetype: add
objectclass: inetOrgPerson
cn: msgUser1
givenname: msgUser1
sn: msgUser1
displayname: Messaging User 1
userpassword: msgUser1_pass

dn: cn=msgUser2,ou=users,dc=amleninternal,dc=auth
changetype: add
objectclass: inetOrgPerson
cn: msgUser2
givenname: msgUser2
sn: msgUser2
displayname: Messaging User 2
userpassword: msgUser2_pass

dn: cn=msgUsers,ou=groups,dc=amleninternal,dc=auth
changetype: add
cn: msgUsers
objectclass: groupOfUniqueNames
uniqueMember: cn=msgUser1,ou=users,dc=amleninternal,dc=auth
uniqueMember: cn=msgUser2,ou=users,dc=amleninternal,dc=auth

Create a new file called ldap.users and copy the above into the file. Normally you would expose the ldap service and then manage it remotely, but for this guide we will do it in the ldap pod. First you will need the admin password which will be in the secret ldap-adminpassword and can be found via:

kubectl get secret ldap-adminpassword -o yaml

Copy the file into the pod, exec in and run the ldapadd command (remembering to deoce the password via base64 -d:

kubectl cp ldap.users ldap-0:/tmp
kubectl exec -it ldap-0 — bash
ldapadd -H ldap://localhost:1389 -D “cn=admin,dc=amleninternal,dc=auth” -w <> -f /tmp/ldap.users
exit

Now you are ready to check the configuration. We can again make use of the python test program:

python3 library/subscribeTestBasic.py -H "amlen-0-mqtt-amlen.apps-crc.testing" -c "client1" -P "31069" -s --insecure -u msgUser2 -p msgUser2_pass -t ldap_test

A successful run will add to /tmp/mqtt_sniff.log

2022-07-11 16:05:24,829 INFO Message received on test1:b'hello'
2022-07-11 16:05:39,509 INFO --------------------------------
2022-07-11 16:05:39,509 INFO host ............ amlen-0-mqtt-amlen.apps-crc.testing
2022-07-11 16:05:39,510 INFO topic ........... ldap_test
2022-07-11 16:05:39,510 INFO clientid ........ client1
2022-07-11 16:05:39,510 INFO port ............ 31069
2022-07-11 16:05:39,510 INFO insecure ........ True
2022-07-11 16:05:39,510 INFO use-tls ......... True
2022-07-11 16:05:50,558 INFO Message Published
2022-07-11 16:05:50,560 INFO Message received on ldap_test:b'hello'

As it’s a small program designed for running in the molecule tests it does not have much error handling, so if the username or password is not accepted the program will loop trying to connect. If it does not complete within a minute use ctrl+c and check the details.

Next steps

You have now got a test system to experiment with. This guide has used a lot of default values which are designed so it has enough memory and CPU to run these simple tests. It does not have the resources to run as a production system. CodeReady Containers is also not suitable for running in production. Amlen runs in an active-passive pair when using HighAvailability, but to make full use of that you want it deployed in a system with multiple worker nodes so that each server instance runs on a seperate node.

Leave a comment

Your email address will not be published. Required fields are marked *