This is the blog section. It has two categories: News and Releases.
Files in these directories will be listed in reverse chronological order.
This is the multi-page printable view of this section. Click here to print.
This is the blog section. It has two categories: News and Releases.
Files in these directories will be listed in reverse chronological order.
Manhattan Active® Platform Release Engineering is a subsystem consisting of processes, conventions, and a set of tools that enable the transit and promotion of code from a developer’s laptop to a production environment. This document describes the inner workings of this subsystem.
Release Engineering consists of two main aspects:
The typical development process can be summarized as the following:
In order to be able to build and test the code that developers commit to the Git repositories, we have put in place the Continuous Integration process using Jenkins.
The topology has been designed in a way to maximize inbound traffic while ensuring the security and stability of the CI subsystem. Jenkins is deployed on a Kubernetes cluster, with the worker (“slave” in Jenkins-speak) nodes being the Kubernetes pods that are dynamically provisioned when the build event occurs.
Key features of the CI pipeline with Jenkins:
Continuous Integration Pipeline is the backbone for the release engineering process during the development cycles and responsible for building, testing and publishing the standard artifacts (i.e. Docker images, NPM packages, WAR, JARs or Zip archives). CI pipeline uses Jenkins multi-branch pipeline jobs to perform various tasks specific to the target Bitbucket repository. The pipeline serves for components and as well as the frameworks. This section summarizes a set of CI pipelines that are most relevant to developers:
The component pipeline job runs once for each component, and is triggered when a new commit is pushed to the target branch of the component’s Bitbucket repository. The component pipeline job monitors a set of known branch name patters for which a build will be automatically triggered:
Purpose of the component’s master branch pipeline is to build and test the component’s code from its master branch where the code from other branches is merged and integrated. Only the artifacts built, tested and published from the master branch are prompted as release candidates, and subsequently released for the downstream deployments, including the Ops managed customer environments. One exception to this rule is the priority (also known as “Hot Fix” or “X” pipeline, described in “priority Branch” section below) where code from a branch other than the master branch is promoted to downstream environments for the purposes of addressing priority or “hot” defects.
Commits pushed to the master branch of the component’s Bitbucket repository triggers the pipeline job, which goes through the phases as summarized below. Clicking on each phase shows the list of activities and logs performed by that phase:
Init: Initializes the job and the workspace for the job
Clone: Clones the target component’s repository at the tip commit of the master branch; records the tip commit ID, which will be used for tagging purposes
BaseImage: Builds the Docker image for the component. The built Docker image is then tagged as base-<commit_id>-<build_number>, and pushed to Quay.io. This image is then used in subsequent phases. Note, that the Docker image is never built again in the entire pipeline. The same Docker image built in this phase then “travels” through the rest of the pipeline, and upon successful completion of the pipeline, the same Docker image is tagged as gold and pushed to Quay.io
Parallel: Two identical phases (Parallel-Y and Parallel-Z) are triggered in parallel. Both phases run identical set of tests with the only difference
being the value of
ACTIVE_RELEASE_ID environment variable. This phase runs the full suite of tests that accompany the components, which typically include
END-TO-END tests (components can customize which of these types of tests to run or skip via their
Jenkinsfile using relevant parameters).
Parallel-Y: This parallel phase is also known as the “Y Pipeline”. This phase runs the identical set of tests as the Parallel-Z phase, but with the
ACTIVE_RELEASE_ID set to the most recently released (“GA”) release identifier. For example, if the most recently released quarterly release is 19.3 (or 2019-06), then the global environment configuration for Parallel-Y phase will have
ACTIVE_RELEASE_ID=2019-06. By setting the
ACTIVE_RELEASE_ID to the most recent GA release identifier, the tests in this phase will use the feature flags configured to be enabled for that release. In other words, with the example of the 19.3 release, Parallel-Y phase runs the component’s tests assuming the feature flags for the 19.3 release are enabled, but the feature flags for the upcoming 19.4 release are disabled. For full details on how feature flags work, see this document.
Parallel-Z: This parallel phase is also known as the “Z Pipeline”. This phase runs the identical set of tests as the Parallel-Y phase, but with the
ACTIVE_RELEASE_ID set to the upcoming quarterly release identifier. For example, if the most recently released quarterly release is 19.3, then the upcoming quarterly release is 19.4 (or 2019-09). In this example,
ACTIVE_RELEASE_ID for the Parallel-Z pipeline is configured as 2019-09, which would enable all feature flags for this release for the tests that run in this phase.
launchme.sh, and the dependencies are launched using launch-deps.sh before the tests are executed.
GoldTag: Upon successful completion of all (100%) tests, the base-<commit_id>-<build_number> image built as part of the BaseImage phase will be tagged
as gold. The gold tag of the component’s Docker image is then pushed to Quay.io. Along with the gold tag, the component’s client libraries, if enabled in the component’s
Jenkinsfile, is also published to JFrog.
Mail: An email notification is sent out to DL_R&D_CI_NOTIFICATIONS alias with the status and links to the Jenkins pipeline.
Support for additional parallel phases as part of the component’s CI pipeline: Certain components require additional test phases to run in parallel in addition to the standard Parallel-Y and Parallel-Z phases depending on the spring profiles configured in the configuration repository as different profiles for the component.
Purpose of the component’s pullrequest branch pipeline is to build and test the component’s code from the given pullrequest branch. The built artifacts are never promoted to downstream environments, but provides a way for the developers to build and test their code before merging the code into the master branch. Commits pushed to any branch named pullrequest/<some_suffix> to the component’s Bitbucket repository triggers the pipeline job, which goes through the similar phases mentioned above for master but without any parallel phase. Upon successful completion of the pipeline, the docker image created in the BaseImage phase is tagged as pr-gold and pushed to Quay.io.
While the default behavior of the pullrequest branch pipeline is to execute each of the pipeline phases described above, the developer has control over what all phases are actually executed if a different behavior is desired. A developer, while making the commit, can insert a tag (defined for each phase ) to control what build phases have to be executed.
Purpose of the component’s priority branch pipeline is to enable developers to make priority bug fix on a previously released Code Drop which is presently deployed in production environments. Typically, the defects that cause work-stoppage or result in a significant loss of productivity at the customer is considered a Priority bug, and is usually escalated to the levels of senior executives. When a Priority bug is reported, the expected time window of fixing it and deploying the fix in the production environment is very short (12 to 48 hours, typically). The Priority Bug-Fix Pipeline will help the development team with a quicker turnaround for producing a bug fix for such a defect.
Commits pushed to any branch named priority/<code_drop_id> to the component’s Bitbucket repository triggers the pipeline job, which goes through the similar phases mentioned above for master but without any parallel phase. Upon successful completion of the pipeline, the docker image created in the BaseImage phase is tagged as
xgold and pushed to Quay.io.
Purpose of the component’s future branch pipeline is to build and test the component’s code from the given future branch. It allows for the developers to “branch out” the code from master for significant changes or enhancements that may not make it back into master for a considerable period of time, or perhaps, never. In concept, future branch pipeline is akin to pullrequest branch pipeline, but the future branch pipeline acts and works more like its master counterpart by going through the same detailed build phases as the master branch. The primary difference between the master branch pipeline and future branch pipeline is that the artifacts produced from the future branch are never promoted to a release or deployment to downstream environments.
Commits pushed to any branch named future/<some_suffix> to the component’s Bitbucket repository triggers the pipeline job, which goes through the similar phases mentioned above for master. Upon successful completion of the pipeline, the docker image created in the BaseImage phase is tagged as
fgold and pushed to Quay.io.
The framework pipeline job runs once for each component, and is triggered when a new commit is pushed to the target branch of the framework’s Bitbucket repository. The framework pipeline job monitors a set of known branch name patters for which a build will be automatically triggered:
Purpose of the framework’s master branch pipeline is to build and test the framework’s code from its master branch where the code from other branches is merged and integrated. This branch publishes the artifacts i.e. JARs, which will be consumed by the components.
Purpose of the framework’s Release & Maintenance branch pipeline is to maintain and publish a specific version of the framework JARs. The build process is same as master.
At the end of the continuous integration pipeline, a microservice Docker image is tagged as a Release Candidate or rc. The RC-tagged Docker images go through a series of production assurance validations. These validations include upgrade tests (to ensure compatibility between versions), user interface tests (to test user experience and mobile applications) and business workflow simulations (to ensure that the end-to-end business scenarios continue operating without regressing). If regressions are found, they are treated as a critical defect and either addressed or toggled off using feature flags so that the code can be released without the regression of the functionality.
As explained above in the master branch build section for the application components, the commits pushed to the master branch of a component’s repository results in a build pipeline, at the end of which, the newly built (and tested) Docker images is published with two tags to Quay.io:
Gold tag: com-manh-cp-<component_short_name>:gold (for example: com-manh-cp-inventory:gold); and
Absolute tag: com-manh-cp-<component_short_name>:
While gold tag designates a component’s Docker image as a “stable” image, it still is not considered ready for release until it goes through additional functional and performance testing. Each component team is responsible to test the gold tagged images (commonly, but incorrectly referred to as “gold image”) and declare readiness of that image for release. This is performed by the means of the rc tags. The component teams test the gold tagged image, and upon successful validation, tag the same absolute version as rc using an automated job. In other words, when a Docker image is rc tagged, for a short period, there will be 3 tags, all pointing to the same physical Docker image: gold, absolute tag and rc.
Metadata about every deployment artifact that gets delivered to Operations, and deployed in the customer stacks is maintained as a YAML file in release-pipeline repository. The metadata is managed as 5 categories:
cattle: The application components
pets: The essential services
configuration: The application and deployment configuration
binaries: The mobile app builds and Point of Sale Edge Server binaries
infrastructure: The infrastructure tools that are used for the purpose of deployment and environment management Each entry in the metadata is (unsurprisingly) called a “metadata entry”. Each metadata entry defines a few attributes:
The group attribute for the metadata entry for each deployment artifact contains one or more values. By associating a deployment artifact to one or more groups, the artifact declares that it “belongs to” any of those groups. The “group” is an arbitrary value and not a predefined enumeration of values. However, each group value must be meaningful in the context of Manhattan products. Currently, the following groups are defined by one or more deployment artifacts
Manhattan Active® Platform Code Drops are the instrument by which the development team deliver the set of deployment artifacts to the Cloud Operations team (and for internal use, to services teams). A code drop is a manifest, or a “bill of lading” containing the set of deployment artifacts, and their versions that are being released as part of that code drop.
Every code drop manifest consists of 4 sets of deployment artifacts:
A code drop can be released as one of the three flavors:
dev-ready: A code drop that is only used for internal purposes, and for development only. The dev-ready code drop manifest contains the same set of deployment artifacts, but their versions are marked as gold (for Docker images) and latest (for Configuration zip or mobile app binaries). This allows the application teams to take a dev-ready drop for their development environments or other internal environments, while ensuring that the standard set of deployment artifacts are all available.
stage-ready: A code drop that is only used for internal purposes, and for development or QA, or for staging the release before it is promoted to downstream environments. The stage-ready code drop manifest contains the same set of deployment artifacts, but their versions are marked as rc (for Docker images) and latest (for Configuration zip or mobile app binaries). This allows the application teams to take a stage-ready drop for their development environments or other internal environments, while ensuring that the standard set of deployment artifacts are all available.
prod-ready: A code drop that is only ready to be released to Operations and for promotion to downstream environments. The prod-ready code drop manifest contains the same set of deployment artifacts, but their versions are marked as the actual release ready versions for Docker images, Configuration zip and mobile binaries. At the time of the code drop event (typically at the end of the 2-week period after completing the necessary testing and validation activities with the deployment artifacts from the stage-ready manifest), the absolute versions of then-current rc tags of the Docker images, most recent Configuration zip file and mobile binaries are recorded in the prod-ready manifest. So, as an example, when we prepared the code drop 2.0.60.x on September 27, 2019, the stage-ready versions of the deployment artifacts were recorded as absolute versions, which produced a prod-ready manifest as shown in the example below (compare it with the snippets shown above for dev-ready and stage-ready manifests - they look nearly identical, except for the difference in the versions):
By recording the absolute versions at a given moment, we essentially “lock down” the versions that will be released to the Operations team. This allows the development team to release a precise and static set of versions across the deployment artifacts. In contrast, the dev-ready and stage-ready manifests point to a more “fluid” gold or rc tags, respectively, which are dynamic in nature by design.
To identify a prod-ready manifest, or a code drop that is delivered to Operations team, a specific versioning convention is used that includes 4 digits: 2.0.XX.YY, where XX and YY are two-digit numbers. The digit XX represents the Code Drop number, and YY represents the Hot Fix number for that code drop. For example, prod-ready manifest identified as 188.8.131.52 is the original manifest for the Code Drop 59. However, when updated to 184.108.40.206 or 220.127.116.11, the last digit represents the Hot Fix # 1 or # 13 in the code drop 59, in this example.
In summary, the dev-ready, stage-ready and prod-ready manifests follow the workflow as described in the diagram below:
As explained in the Code Drop section above, the bi-weekly releases and the shipped manifest are part of a recurring process, and hence scheduled events. However, there are times when high priority CIIs (usually to report high or critical impact defects, or to ask for other time-sensitive changes) are reported by the services team, Operations or the customers (including the internal customers such as the PA or Sales). The requester may not be able to wait for the requested changes until the next code drop to be released and deployed in their environments. In these cases, upon approvals by senior leaders of the application teams (usually at least a Director-level lead), a change may be allowed to be released as a “Hot Fix” on the most recently shipped code drop.
A Hot Fix request is typically reacted by delivering a Docker container (if the desired change was made in an application component or an essential service), a new Configuration zip (if the change is in the configuration repository), or a new mobile binary. Regardless of the type of deployment artifact being delivered as a Hot Fix, the change is recorded in the most recent prod-ready manifest in the release-pipeline repository branch specific to that code drop, and a new version of the manifest is delivered to the Operations team.
As an example, assume that component-order was released as com-manh-cp-order:1.2.3-abcdefg-1909030621 as part of say, code drop 18.104.22.168. For a CII that was reported for the component-order, a new Hot Fix version was built with a change. Assume that the revised version is com-manh-cp-order:1.2.4-ghijklm-1009132056. To release this Hot Fix, the prod-ready manifest of the code drop 2.0.58.x is updated to record the new version of component-order, and the version of the manifest is incremented to 22.214.171.124.
Hot Fixes for a given Code Drop are cumulative. In other words, when the Code Drop 126.96.36.199 is initially delivered to the Operations team, it contains no hot fixes. With each hot fix (or set of hot fixes if made available at the same time) the manifest’s version number is incremented to 188.8.131.52, 184.108.40.206 and so forth. It is possible that a single Hot Fix increment may have a single change (e.g., component-order only), or may have multiple changes (e.g., component-order and component-payment). However, the last digit of the code drop version will always be incremented by 1. In other words, when 220.127.116.11 is delivered, the Operations teams will also get the Hot Fix delivered as part of 18.104.22.168.
Manhattan Active® Platform is built, delivered, and supported by the Manhattan R&D engineering. The engineering teams follow distributed development and continuous integration practices. The development teams responsible for individual microservices are small, and geographically collocated in most cases. This document describes the practices, procedures and tooling that support the DevOps and Reliability Engineering operations:
This section covers the “left” side of the DevOps practices in Manhattan, describing how developers code, build, test and release the software binaries for deployments in customer environments:
The typical development process can be summarized as the following:
The microservice codebase goes through a set of distinct phases in the continuous integration pipeline. In each phase, a special category of tests is executed for the microservice code, and only upon successful completion of all tests, the code moves to the next phase. These phases also act as the quality gates and ensure that code is release-ready only if successfully passes through all quality gates. Each development team develops, builds, and tests their microservices in isolation or with minimal dependencies from other microservices. This model allows for independent development, and loose coupling between microservices. Where necessary, the tests use mocks to represent its dependencies, which allows every microservice to maintain and adhere to service contracts for backward compatibility of interactions between microservices.
At the end of the continuous integration pipeline, a microservice Docker image is tagged as a Release Candidate or
rc. The RC-tagged Docker images go through a series of production assurance validations. These validations include upgrade tests (to ensure compatibility between versions), user interface tests (to test user experience and mobile applications) and business workflow simulations (to ensure that the end-to-end business scenarios continue operating without regressing). If regressions are found, they are treated as a critical defect and either addressed or toggled off using feature flags so that the code can be released without the regression of the functionality.
After production assurance vetting, the release candidate Docker images go through a process of release to the Manhattan Active® Operations team, where they are promoted to the customer environments - first to the customer’s lower lifecycle environment, followed by the customer’s production environment. Manhattan Active® Operations uses a set of deployment tools built by Manhattan engineering teams to perform the customer environment deployments and upgrades. The deployment tools work natively with Kubernetes and Google Kubernetes Engine to execute the deployment tasks.
Manhattan fully understands the criticality of security in software engineering and the need to build a stronger security posture starting with the foundation: the code. The topics below summarize the key security aspects that relate to secure software development practice at Manhattan:
The platform and product codebase is maintained in private Git repositories in Atlassian Bitbucket. Access to these repositories is authenticated via either HTTPS (using OAuth2 SSO with Manhattan corporate directory) and SSH (using Git SSH private key). Specific user groups with access permissions, such as read, write and admin are created for controlling access to the code repositories. Users, including automated systems such as the Continuous Integration and Release Pipeline, are then assigned to the designated groups based on their role and development needs.
Likewise, the built binaries, such as Docker containers or JAR files, are also stored in private binary repositories such Google Container Registry and JFrog Maven repository. Access to the binary repositories is also controlled via OAuth2 SSO with Manhattan corporate directory and Docker login credentials. Access controls apply uniformly to Manhattan development staff and automated build, testing and release subsystems.
Manhattan uses a combination of security development practices for ensuring the codebase released to run in production is free of security vulnerabilities and to identify new vulnerabilities or security “hot spots” that may be introduced either in the code that the developers committed or in the 3rd party libraries that are included in the built binaries:
Manhattan engineering teams follow DevOps principles and consider customer deployments and operations as integral parts of the development process. While roles, responsibilities and skill sets separate the development and operation staff, the overall team is part of a single R&D organization, and every member of the organization is responsible for accuracy, consistency and validity of the development and operations processes. A few of the key DevOps practices followed by Manhattan R&D are summarized below:
Manhattan Reliability Engineering (or MRE) team is part of the Manhattan Active® Operations under the Manhattan R&D organization, and consists of a set of software engineers, deployment architects and system engineers; primarily responsible for ensuring the stability, availability, security, and lower cost of ownership for the customer environments. The MRE team is tasked with building, instrumenting, and operating a set of automation and management tools that are used to perform a set of key tasks crucial in maintaining reliable systems:
The MRE team follows the reliability workflow described below for preventing, diagnosing, and treating system or performance issues.
The MRE team focuses on proactively preventing problems by continuously monitoring the customer environments, and by treating the early signs of problems. However, when incidents do occur, the MRE team relies on the reactive alerting mechanism, and troubleshooting tools to diagnose the symptoms to potentially address the problem before they result in service degradation or outages. Significant focus is put on two key concepts of the reliability engineering to strengthen the stability and efficiency of Manhattan Active® Platform:
Mean Time to Repair: Measuring the engineering performance based on how quickly the team can repair or restore an outage. MTTR is directly relevant to the customer as it is a measure of the amount of downtime the customer sees. The MRE team is responsible for continuously improving the monitoring ecosystem with the primary goal of reducing the MTTR.
Automation: Systems, processes and tooling that can be automated help with improving consistency of deployment behavior, minimize the potential for human errors, reduce the operational costs, and create the ability to learn & forecast from previous problems. Automation reduces toil and enables engineering teams to focus more on strategic initiatives. The MRE team is responsible for building and operating sub-systems for automatic deployment & upgrade processes, self-healing, and self-service.
Manhattan Active® Platform consists of a monitoring ecosystem for automating and managing consistency, reliability, and resilience of the customer environments. Some key constituents of the ecosystem are listed below:
Manhattan Active® Platform is integrated with Elasticsearch, Fluentd and Kibana as the toolset at the core of its logging subsystem. These services are instrumented with SLF4J and Docker to stream the logs from the stack components into the Logs Collector:
Additionally, the Logs Collector has a set of internally available APIs that can be used by other tools to read the log data for generating alerts and visualizations.
Manhattan Active® Platform uses Prometheus, Grafana and Alert Manager as the toolset at the core of its monitoring subsystem:
Metrics Collector collects and persists three types of metrics data:
The Monitor is a platform component as part of the Manhattan Active® Platform and performs specific operations on the logs and metrics data to make them usable for alerting based on application conditions. The Monitor uses internal APIs provided by the Logs Collector (via Elasticsearch) and Metrics Collector (via Prometheus) to scrape for a predefined set of patterns and labels, and churns alert-able data from the raw logs and metrics. The output is read by the Metrics Collector, which then can produce alerts for specific conditions determined by the Monitor. Examples of some of these conditions are listed below:
Additionally, the Monitor also integrates every deployment stack with Keyhole - the global monitoring dashboard - such that specific alerting data can be streamed directly from the deployment stack to Keyhole, allowing for near real-time notifications of alert conditions.
Keyhole is the global monitoring dashboard built using Grafana and Prometheus running on Google Cloud Run platform. Keyhole receives alert notifications from all customer environments, and depending on the severity of alerts, it derives and indicates a deployment stack’s health, along with the recent trend of critical alerts. Keyhole also has an ability to drill down into each deployment stack’s monitoring dashboard from the global dashboard, allowing for easy access to individual customer environments directly from the global dashboard.
Manhattan Active® Operation teams continuously monitor Keyhole dashboard to get a comprehensive, and real-time view of the customer environments across the globe.
Slammer is the global alert capture and analysis database built using Google BigQuery and Google Cloud Run platform. Slammer receives alert notifications from all customer environments. Using the historical alert data, Slammer can perform various analytics such as alert trends, service quality and SLO analysis.
Slammer periodically publishes the report of key alerts and their trends from the customer environments, providing crucial learnings that are used for continuous improvements and predictions for future scenarios under similar conditions.
Arecibo is the global activity recorder that captures system interactions from all deployment stacks and persists them in a database built on Google BigQuery. Arecibo enables a centralized view of system interactions such as inbound or outbound HTTP calls, asynchronous message transmissions, and extension-point invocations. The activities recorded by Arecibo are helpful in identifying latency and performance of these system interactions.
Arecibo periodically publishes the report of latency and performance data of system interactions across all deployment stacks. These reports are used for performance engineering and tuning of the system infrastructure and application code.
Wiretap is a platform component as part of the Manhattan Active® Platform, which performs specific operations on the application data to produce functional metrics, which are then scraped and persisted by the Metrics Collector as time-series data. These metrics are subsequently inspected for alert conditions to reveal potential inconsistencies in business transactions. Wiretap provides insights into the symptoms of functional problems, or conditions that could lead to such problems.
Examples of some alert conditions that Wiretap captures are listed below:
Sonar is the synthetic probe to keep track of the help of every HTTP endpoint deployed as part of the Manhattan Active® Platform. Sonar “pings” the HTTP endpoints across customer environments at a scheduled frequency and records the HTTP status of the response. If the status, latency, or availability of the response do not meet the previously defined criteria, Sonar then reports the error condition as an alert to be evaluated and acted upon by MRE.
Lacework is a security tracing solution that helps the MRE team identify anomalous activity that deviates from normal behavior in a customer environment and may indicate a threat. Lacework leverages behavioral models based on the usage patterns of the system, and reports unusual behaviors, invocations, and software components, which can then be analyzed by the security analysts to detect threats, risks, and potential exposures.
Pinpoint APM is an open-source application performance monitor agent and visualizer, integrated with the Manhattan Active® Platform. Pinpoint APM agent captures performance metrics and stack traces via bytecode instrumentation and persists these metrics in HBase. These metrics can then be analyzed with Pinpoint APM web console to identify performance bottlenecks or other symptoms of performance degradation.
The graphic below summarizes the monitoring, alerting, and reporting ecosystem deployed as part of the Manhattan Active® Platform:
Manhattan Associates monitors our active platform for availability, events, and security.
We use Prometheus to data mine for application status via Kubernetes and a custom service (Snitch) that queries component endpoints that provide health and other detailed information.
Alertmanager works with Prometheus which raises events based on configured rules. These events are then sent to our centralized event collection service (Slammer - Service Level Agreement manager). Slammer will hold any critical events to allow self resolution/healing to take place. If a resolution has not been received in the allotted time, Slammer will create an alert and send it to our paging service (Pagerduty).
We have a custom service that will actively ping (Manhattan Sonar) the external endpoints that our customers use to verify that they are available. If the service is unavailable, it will send the event to Slammer via a webhook.
Slammer stores all events into a data lake which we used to identify trends and produce daily reports for internal big picture consumption.
From a security perspective, Manhattan Associates' cloud security team monitors all of our customer and internal environments with Lacework. Lacework uses machine learning to bubble up unusual activities as well as any security vulnerabilities and common vulnerabilities and exposures (CVE). This allows prompt identification, and resolution of events of interest.
The Manhattan Active® Network Operation Center (or NOC, for short) is the team primarily responsible for monitoring the health and availability of the customer environments. The NOC team is tasked with initiating the resolution workflow when a system or functional alert is triggered.
A variety of curated and refined set of system and functional alerts are defined as part of Manhattan Active® Platform deployments. These alerts fall in two broad categories:
The lists below include some key proactive and reactive alerts that are monitored by the NOC team:
The workflow shown below describes the rules of engagement and communication for resolving alerts with a singular focus of minimizing the downtime or degradation and exceeding the SLA of the Manhattan Active® solutions.
Manhattan Active® Platform is deployed as a distributed application using Google Kubernetes Engine on Google Cloud Platform. MA products exposes two HTTPS endpoints by default,and they are the Authentication endpoint and Application endpoint.These endpoints are exposed to external using NGINX ingress controller. NGINX Ingress Controller is an Ingress controller that manages external access to HTTP services in a Kubernetes cluster using NGINX. All the configurable traffic routing using the ingress resource,and TLS termination for Manhattan Application are done on the ingress controller level. Currently, the NGINX ingress controller exposes the MA application to the outside world using the Cloud provider TCP load balancer.
All the application and authentication urls are mapped to the cloud load balancer with a publicly resolvable DNS names.
All the traffic to, from and within Manhattan Active® Platform is encrypted with TLS v1.2 by default. The inbound HTTPS load balancer listens to port 443. The HTTPS endpoints accept traffic from Manhattan Active® Platform web user interface, mobile applications, and REST clients. The ingress controller has configured to serve the application in HTTPS. It works by offloading this functionality from the application. We have configured the SSL certificate in the ingress controller for all the ingress resource rules created for the MA application.
Also, We have automated the lifecycle of the TLS certificates with Cert-manager and Let’s Encrypt as a certification authority. Cert-manager automates the provisioning of HTTPS certificates within the Kubernetes cluster. It provides custom resources to simplify the provisioning, renewal, and use of those certificates.
With NGINX, we can achieve end‑to‑end encryption of all requests in addition to it making Layer-7 routing decisions. In this case the clients communicate with NGINX over HTTPS, which then decrypts the requests and then re‑encrypts them before sending them downstream to the application gateway, or the Zuul Server.
NGINX can handle SSL/TLS client certificates and can be configured to make them optional or required. Client certificates are a way of restricting access to the application to only the authorized clients without requiring a password. We can control the certificates by adding revoked certificates to a certificate revocation list (CRL), which NGINX checks to determine whether a client certificate is still valid.
The Cert Manager adds certificates and certificate issuers as resource types in a Kubernetes clusters, and simplifies the process of provisioning, renewing and using these certificates. It can issue certificates from a variety of supported sources, including Let’s Encrypt, HashiCorp Vault, and private PKI. Cert Managers ensure the validity of the certificates, and automatically renews the certificate at a configurable time before the expiration.
Manhattan Active® Platform uses Let’s Encrypt, a global Certificate Authority (CA). The Cert Manager ecosystem has API based automation built with Let’s Encrypt to renew and provision the certificates, which then are distributed to many deployment stacks via a Kubernetes cronjob that detects the need for renewal and requests a renewed certificate from the central certificate governing system. Let’s Encrypt serves as a platform for advancing TLS security best practices. All certificates issued or revoked will be publicly recorded and available for anyone to inspect.
Cert-manager runs within in a dedicated tools Kubernetes clusters as a series of deployment resources. It utilizes CustomResourceDefinitions to configure Certificate Authorities and request certificates.Along with cert-manager we have configured Let’s Encrypt as the ClusterIssuer resources which represent a certificate authority. For every customer we are creating a wildcard domain certificate which is injected as a kubernetes secret used by the ingress resources in the cluster.Also, the Let’s Encrypt CA is using the DNS-01 challenge to validate the domain name of the certificate before issuing the certificate.
Cert-manager will automatically renew Certificates. It will calculate when to renew a Certificate based on the issued X. 509 certificate’s duration, and a ‘renewBefore’ value which specifies how long before expiry a certificate should be renewed. Default duration configured in Manhattan Active® Platform is 90 days. There is a cronjob running on the manhattan tools cluster to periodically runs the helper scripts found with in the cert manager in order to generate/renew SSL certificates using the cert manager from LetsEncrypt.
All the generated, and the renewed certificates are stored on a shared storage used by the sidekick application. Each custom environment will download the ssl certificate from the sidekick application and will expose the certificate via a Kubernetes secret. There is a dedicated cronjob running on each custom environment to validate and fetch the certificate before it expires.
Zuul is an API gateway in the Manhattan Active® Platform that provides dynamic routing,monitoring and resiliency,security and more. The NGINX ingress controller traffic matching the ingress rules are sent to Zuul server first and then Zuul delegates the authenticated user traffic request to the other application components running in the microservice deployment. Zuul will also run as another microservice application running on the kubernetes cluster. Authentication service, called as oauth server is another Manhattan platform component responsible for both authentication and authorization. All the client initiated traffic which are not authenticated are redirected to auth server by zuul server for completing the authentication by generating the valid JWT tokens.
Zuul and Auth service are exposed to outside world by using the NGINX ingress controller. Ingress resource routing rules are created on the NGINX controller for both zuul and auth service. When the client traffic hit the NGINX ingress controller via the cloud load balancer, NGINX will compare the host header of the user traffic, and the ingress rules on the controller, and if the header match then it will forward the traffic to the Zuul server. For the first time login, Zuul will redirect the call to auth server and that will be another NGINX ingress routing call to auth server for authenticating.
The manh Hugo theme includes sample archetypes that can get you started with a new Markdown file based on the Kind of document. The starting document includes starting YAML metadata (Front Matter) which influences the way the document is displayed. The steps to create a new document are as follows
Many developers may be contributing to this Repo and theme changes may have occured since you last worked in the Repo, so make it a habit to pull frequently.
git checkout main git pull --recurse-submodules
Create a new branch for your changes which can later be reviewed / approved through a pull request.
Please follow a standard branch naming convention
- use all lowercase letters
- use - for spaces
- provide a short name of the document being created
git checkout -b new-blog-entry-2022-05-02
A series of sub folders below the top level section provide additional organization. Each document is created as a markdown file within its own folder along with an images folder.
|docs/faq||Frequently Asked Questions - 1 Document per Question|
|docs/guides||General Guides and Other Documents|
# Example for creating a new Blog in the "article" sub section hugo new blog/article/2022-05-02-example # Example for creating a new Blog in the "article" sub section hugo new blog/article/2022-05-02-example
Update the YAML section at the top of the index.md file created for your document. Important items to update include -
Comments should help explain the variables and their use, feel free to remove / update comments in this section as part of your documentation.
Finally! Create the documentation that you want to share with your colleagues and the development community. The starting document has some examples of Markdown to help. Using your favorite code editor, you can keep an eye on how the markdown is going to look.
VS Code Users : Here is a handy extension - doc markdown
Once you have a good working draft, you can review how it will appear in the site by running Hugo on your local machine.
# Returns the local server hosting pages # by default, this is likely //localhost:1313/ hugo server
NOTE: You can live edit your documentation, so snap your browser and editor side by side and live edit.
Once you have completed your documentation, commit and push your changes to the remote repo
# Supply a commit message and push your changes git add -a git commit -m "initial draft" git push --set-upstream-to origin/new-blog-entry-2022-05-02
Now submit your pull request for another colleague to review your documentation.