1 - High Availability and Scalability

Understand how Manhattan Active® Platform handles high availability and scalability of the runtime components and the underlying compute.

Introduction

Manhattan Active® Platform utilizes the power and flexibility of Kubernetes to ensure high availability of compute resources, services, and application components while allowing for horizontal scaling of synchronous and asynchronous workload processors to meet volume requirements. Manhattan Active® Platform is built on the principles of redundancy and concurrency, making it adaptable to inevitable compute failures, resource starvation, and application faults.

Maintaining Availability & Uptime

Manhattan Active® Platform is built with a 3-dimensional fault-tolerance design to maintain high availability and uptime of the deployed applications:

  • Availability of Persistence: Database instance maintaining the transactional data is deployed as 2-instances in an active-passive HA topology, spread across two availability zones. In case of unavailability of the primary database instance, the connection automatically and transparently switches over to the secondary database instance. Likewise, other persistent services are deployed as 2+ node cluster with independent persistent volumes across availability zones.
  • Availability of Compute: The Kubernetes cluster that hosts Manhattan Active® Platform runtime is deployed in multiple availability zones within a geographical region. Redundancy of compute resources and network across availability zones allows for contingency in case of potential failures or unavailability of one of the zones in the region.
  • Availability of the Application: The Kubernetes Deployment and StatefulSet that run the application components and system services are deployed (and auto-scale) as multiple replicas. This allows for contingency in case of potential failures or unavailability of one of the replicas of the application components or services. Moreover, system and application updates are applied using a “rolling” strategy to ensure zero downtime upgrades.

All customer environments are deployed using a standard deployment framework that follows a consistent deployment topology, with variations based on the environment designation:

  • Production: regional deployments (3 zones), with database HA.
  • VPT: regional deployments (3 zones), without database HA.
  • Staging and Development: zonal deployments (1 zone), without database HA.

Disaster Recovery

A regional deployment of Manhattan Active® Platform stack on multiple availability-zone helps with high availability and fault-tolerance to protect against partial or full unavailability of a zone. However, a regional deployment does not protect when an entire compute region suffers a loss due to a significant outage in that region. To ensure business continuity during such events, Manhattan offers an optional disaster recovery deployment of Manhattan Active® Platform on a remote compute region located geographically away from the primary region.

When the disaster recovery option is selected, Manhattan commits to well-defined recovery objectives as part of the service level agreement of the customer contract:

  • The Recovery Point Objective (RPO) is the age of files that must be recovered from backup storage for normal operations to resume if a computer, system, or network goes down because of a hardware, program, or communications failure. Manhattan Active® Platform DR Option commits to an RPO of less than or equal to 1 hour
  • The Recovery Time Objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) to avoid unacceptable consequences associated with a break in business continuity. Manhattan Active® Platform DR option commits to an RTO of less than or equal to 4 hours

The Manhattan Active® Platform disaster recovery deployment is an instance identical to the primary instance. The DR instance is deployed with minimal workload necessary to act as the “warm” data backup of the primary instance. Business data is replicated or synchronized to achieve the committed RPO and RTO agreements. In the event of a disaster, Manhattan Active® Platform Operations will the standard operating summarized described below:

  • Bring up the remainder of the application components and persistent services to activate the disaster recovery instance as the primary instance.
  • Rewire the application endpoints via the Cloud DNS so that the application URLs in use by the customer remain unchanged.
  • Work with customers following standard service interruption protocols. The Customer Status page will be updated to reflect the DR event.

The Manhattan Active® Platform disaster recovery deployment is enabled for customers purchasing the option. The deployment is tested and audited by a certified 3rd party semi-annually as part of Manhattan’s SOC 2 compliance programs. Contact Manhattan Sales or Professional Services for more information about implementing Disaster Recovery.

The illustration below summarizes the regional (primary instance with multiple availability zones) and multi-regional (additional disaster recovery instance) deployments of Manhattan Active® Platform:

Scalability

Having the ability to increase compute capacity elastically proportional to the increasing traffic and volume is essential to maximizing the throughput and performance of software deployment. Conversely, the ability to shrink compute capacity when the volume of business workload shrinks is also key to reducing the overallocation of compute resources, costs, and the carbon footprint of the deployment. Scalability is an integral part of the core architecture of Manhattan Active® Platform, enabling optimal throughput while minimizing the total cost of ownership for Manhattan and the customers.

Container Auto-scaling

Manhattan Active® Platform relies on concurrency and disposability of its microservices architecture and the power of Kubernetes to enable horizontal scaling of the deployments. Horizontal scalability allows Manhattan Active® Platform to minimize application downtime and scale the application seamlessly, while continuously adjusting the size of the deployment to handle the spikes and drops in workloads.

Manhattan Active® Platform consists of an auto-scaling engine at the core of its architecture. The engine comprises two key components:

  • Controller is the “brain” behind the auto-scaling; it continuously monitors various system metrics such as incoming traffic, processing rate, queue depths, average volume served by a specific business function, and the CPU utilization across the Kubernetes cluster. By evaluating these metrics, Controller computes the need for scaling and issues commands to Conductor to perform the scaling operation.
  • Conductor is the “brawn” behind the auto-scaling; it receives commands from Controller and performs the scaling operations. Conductor manages the Deployment and StatefulSet of the Kubernetes cluster and invokes the Kubernetes API to update the scaling parameters of the underlying deployments. Conductor additionally performs maintenance functions such as node taints, workload movements, container reboots, and rolling updates to help with scaling and cleanup.

Horizontal scaling performed by the auto-scaling engine can be classified into two categories:

  • Responsive, or Reactive scaling: Scaling performed as a response to the increasing or decreasing size of the business workload based on the real-time computations performed by Controller. Responsive scaling is important for the application to cope with bursts and slumps in the incoming volume or throughput requirements.
  • Managed, or Proactive scaling: In certain cases, larger resource availability may be necessary to be in place ahead of time when there are predictable increases in incoming volume or throughput requirements. The prediction may be predictive, scheduled, or preemptive:
    • Predictive scaling: The auto-scaling engine determines the future scaling needs based on the historical trend of the throughput requirements
    • Scheduled scaling: A recurring definition of the scaling needs that are placed ahead of time based on prior experience
    • Rules-based scaling: A decision maker defining the scaling rules ahead of time, which results in scaling commands being executed before the burst or drop in volume

Compute Auto-scaling

Conductor performs updates of the desired replica counts on the target Deployment or StatefulSet, effectively updating the number of running container Pods. When the CPU or memory requirements to run these Pods go beyond the currently available compute capacity, Kubernetes issues commands to deploy additional virtual machine nodes to the node pool. Likewise, when the CPU or memory requirements shrink, Kubernetes will accordingly adjust the node pool sizes by freeing up surplus compute capacity.

Compute scaling based on the capacity requirements is managed by Kubernetes based on configured values of minimum, desired, and maximum node counts of the underlying node pools that allocate the virtual machines (or “nodes”). As illustrated below, the node pools scale up and down proportionately to the volume trend. With increasing volumes, the node pools will add new nodes while draining the nodes when the volume decreases.

Learn More

Author

  • Kartik Pandya: Vice President, Manhattan Active® Platform, R&D.

2 - Manhattan Active® Platform Security

Basic concepts of the security principles embedded in the architecture of the Manhattan Active® Platform for a strong security posture.

Introduction

Security of the customer’s data, intellectual property and business workflows is one of the highest priorities of Manhattan. Special emphasis has been put to ensure that the security aspects of communication, authentication, authorization, and access control are designed and implemented with a mindset of a security-driven software development lifecycle. Moreover, the architecture and design of Manhattan Active® Platform are continuously enhanced based on the feedback from various forms of security testing (such as static, dynamic and penetration) and reviews by InfoSec.

The topics below highlight the key design considerations of Manhattan Active® Platform security.

Deployment Security

Manhattan Active® Platform is deployed as a distributed application using Google Kubernetes Engine on Google Cloud Platform. Each customer environment is deployed as a single-tenant stack, isolated for other stacks for the customer, and other customer environments. Isolation between environments creates a two-dimensional sandbox for each environment:

  • Resource isolation: Compute resources and application runtime are grouped in separate Google projects, one per customer environment. Isolation at the project level provides physical separation of processes, virtual machines, database and storage between environments, forbidding resource access across environments.
  • Network isolation: Networks and the CIDR for each customer environment are unique and independent to restrict access from one environment into another. The network firewall and ACL rules for each environment forbid network-level access across environments.

The network isolation is governed using Virtual Private Cloud networks. A VPC is configured per project (and per stack) to create a “jail cell” for the resources and addressable endpoints such that their access is limited within the VPC. Manhattan Active® Platform supports two ways of inbound communication:

  • HTTPS traffic: Manhattan Active® Platform exposes two HTTPS endpoints: Authentication endpoint, and Application endpoint. All inbound HTTP traffic must use TLS v1.2 or higher. The inbound HTTPS load balancer listens to port 443. The HTTPS endpoints accept traffic from Manhattan Active® Platform web user interface, mobile applications, and REST clients.
  • Asynchronous traffic: All inbound messages are received via Google Pub/Sub topics, which are then processed by Manhattan Active® Platform to be placed on internal queues for subsequent message handling. Access to Google Pub/Sub is restricted using stack specific service accounts and are shared with the customers for external integration.

Additionally, Manhattan Active® Platform supports a few administrative endpoints exclusively for the use by Manhattan Active® Operations teams. The administrative endpoints are restricted and governed for access strictly by only the members of Manhattan Active® Operations team. The diagram shown below illustrates the access control measures of the Manhattan Active® Platform control-plane (administrative access) and data-plane (application access):

As an option, Manhattan also offers technology consulting service and custom options for additional network security configurations such as the options listed below. Additional network security options may be considered and supported at Manhattan’s discretion.

  • Source IP whitelisting: As a default deployment, customer environments are accessible over the (public) Internet. While the HTTP traffic to Manhattan Active® Platform is always encrypted via TLS v1.2 or higher, customers may opt to whitelist source IP addresses to allow inbound traffic to Manhattan Active® Platform only from a set of known IP addresses.
  • Destination IP whitelisting: Outbound traffic from Manhattan Active® Platform originates from a set of known static IP addresses. Customers may opt to whitelist these IP addresses to allow traffic from Manhattan Active® Platform into the customer network.

Communication Security

Communications to, from and within Manhattan Active® Platform is a complex ecosystem. Manhattan Active® Platform architecture ensures that all aspects of the communication system remain secure and protected from parties with potentially malicious intent. This section lists key paths of the communication security and encryption:

  • All network traffic between Google Cloud Platform data centers is encrypted (TLS v1.2+)
  • All inbound network traffic to Manhattan Active® Platform is encrypted (TLS v1.2+)
  • All network traffic between Manhattan Active® Platform and Google Cloud services (such as Google Cloud SQL, Cloud Storage or Pub/Sub) is encrypted (TLS v1.2+)
  • All outbound network traffic from Manhattan Active® Platform is encrypted (TLS v1.2+). This applies to the invocations built into the base product; there could be additional custom invocations to 3rd party systems that may not support HTTPS traffic. While customers are highly discouraged from using unencrypted traffic, such invocations are not forbidden.

Service Accounts and Security Context

Asynchronous communication to- and from- the Manhattan Active® Platform and customer’s host systems is achieved using messaging via Google Cloud Pub/Sub. To authorize the messages, and to establish an authenticated system-user context, Manhattan Active® Platform relies on Google IAM Service Accounts and Roles.

To authorize Manhattan Active® Platform to pull and post messages to Pub/Sub, a built-in Service Account is used that is instrumented in the containers via a Kubernetes Secret. This Service Account has limited access to perform Pub/Sub operations necessary for topic management and message processing.

To authorize the customer to pull and post messages to Pub/Sub, a Service Account is created per deployment and shared with the customer via secure encrypted channel. This Service Account has limited access to perform Pub/Sub operations necessary for the business use-cases of the client and with restrictions to allow access only to the Pub/Sub topics that relate to the customer. The Service Accounts shared with the customers can be recycled based on the customer’s request. Manhattan does not require a regularly scheduled recycling of the Service Account keys.

Upon receipt of an authorized message via Google Cloud Pub/Sub, Manhattan Active® Platform establishes a security context for the message using the username, organization name and tenant ID from the message header and processes the message within that context. Regardless of depth of the call stack, or the length of the chain of application components that process the message, the security context is always preserved. The context helps maintain the security and privacy of the data in the message by ensuring that the exposure of this data is limited to the processing elements pertaining to that user, organization, and tenant.

Data Encryption & Encoding

Manhattan takes customer data security very seriously and commits to rigorous compliance and protection policies for the customer data. To ensure data security and to prevent unauthorized access to data in case of a network security breach, data in Manhattan Active® Platform is encrypted in transit and at rest:

  • Encryption of data in transit: As described in the Deployment Security section above, inbound HTTP and messaging traffic is accepted only over HTTPS; application endpoints of Manhattan Active® Platform and Google Pub/Sub require traffic to use TLS v1.2 or higher.
  • Encryption of data at rest: Persistent volumes used by Manhattan Active® Platform services (Database, Elasticsearch and RabbitMQ) are encrypted by default. All persistent volumes offered by Google Compute Engine are encrypted out-of-the-box.
  • Encryption of payment data: In addition to storage encryption, Manhattan Active® Platform also encrypts payment data (such as gift card numbers, expiration dates, payment codes etc.) using a high-strength encryption algorithm (Blowfish) supported by JDK.
  • Encryption of sensitive properties: All sensitive configuration properties, such as API keys, account numbers, passwords, and access codes, are encrypted using a high-strength encryption algorithm provided by the Spring Crypto library (AES)
  • Password hashing: User passwords and PINs are hashed using a high-strength hashing algorithm provided by the Spring Crypto library (BCrypt)

Application Security

As shown in the diagram below, the HTTP access to Manhattan Active® application resources is restricted to the users or system authorized to access those resources. The restrictions are controlled via user authentication, grants-driven permission control and data access. The authorization mechanism is implemented using the industry standard OAuth2 protocol and Spring Security framework.

  • Manhattan Active® Platform supports Open ID or SAML for identity verification and authentication.
  • The user sign-ons are SSO enabled. The OAuth2 access token obtained with the sign-in process is used throughout the session spanning across applications.
  • The invocations from the mobile application are stateless (no session maintained on the server). The invocations originating from the mobile app are OAuth2 enabled.
  • The invocations from the browser are stateful (a session is maintained per user sign-in on the server). The invocations originating from the browser are session ID enabled, where the access token is maintained in the server-side session.
  • The access tokens are created with a predefined expiration period. The token becomes invalid when it expires.

The illustration below describes the authentication and access control flow for a user or system invoking the HTTP endpoints of Manhattan Active® Platform:

Authentication Modes

Manhattan Active® Platform standardizes authorization with OAuth2 for HTTP traffic. For identity provisioning and active directory integration, Manhattan Active® Platform supports a variety of authentication modes with Open ID and SAML:

  • External Authentication Mode with Open ID is a login configuration where the user is exclusively authenticated via an external Identity Provider using Open ID as the identity protocol. In this mode, all users are maintained by the external Identity Provider.
  • External Authentication Mode with SAML is a login configuration where the user is exclusively authenticated via an external Identity Provider using SAML as the identity protocol. In this mode, all users are maintained by the external Identity Provider.
  • Mixed Authentication Mode with User Discovery is a login configuration where the user identity is managed either in the Native authentication source by Manhattan Active® Platform, or by an External Identity Provider with Open ID or SAML. The determination for the authentication mode for the actual user is made in real-time when the user attempts to log in, based on the user’s username. In this mode, the user is first prompted to enter their username, and the UI then redirects to the authentication mode configured for that user.
  • Native Authentication Mode is a login configuration where the user is exclusively authenticated by Manhattan Active® Platform. In this mode, the user directory, and credentials (in the form of usernames and passwords) are maintained in Manhattan Active® Platform database. Users that are maintained with the native authentication mode are referred to as native users to distinguish them from the users that are maintained in the corporate directory. If no other authentication mode is configured, Native Authentication Mode is the default configuration.

Manhattan supports known Open ID and SAML Identity Providers for configuring as the External Authentication Mode. Integration with the IDPs listed below has been tested and supported:

  • Microsoft Azure AD: Open ID & SAML
  • ADFS: Open ID & SAML
  • Okta: Open ID & SAML
  • CA SiteMinder: Open ID
  • Ping Identity: Open ID
  • MITREid: Open ID
  • KeyCloak: Open ID
  • IBM Security Access Manager: SAML

Do note, that IDPs may have their nuances - not all IDPs support the full Open ID and/or SAML standards or may support additional features that are not part of the standards. In such cases, Manhattan offers technical support and consulting to accommodate the testing and validation necessary for a specific IDP to fully integrate with Manhattan Active® Platform.

User & Identity Types

Manhattan Active® Platform supports two user types for authentication:

(Human) users: The user accounts which are configured for the human users who use the application’s user interface and login via a web browser or a mobile application to access the system interactively.

  • Human users cannot access API endpoints, nor can invoke the application in a non-interactive mode.
  • Human users can be configured either via the Native Authentication (see the section above) or could be maintained by an external Identity Provider. Manhattan Active® Platform supports the Mixed Authentication mode, where a subset of the users (typically, temporary or non-corporate) users are maintained by the Native Authentication, while the rest of the users (typically, permanent or corporate) users are maintained by the external Identity Provider.
  • Certain (configurable) password restrictions and policies apply to human users maintained by Native Authentication.

Robot users: The user accounts which are configured for system-to-system integration for non-interactive access of Manhattan Active® application.

  • Robot users cannot access the browser or mobile app user interface.
  • All Robot users are exclusively maintained by Native Authentication and cannot be maintained by an external Identity Provider.
  • Password restrictions or policies do not apply to robot users. Their passwords can be cycled as needed, but the password cycling is not mandatory.

For more, see Auditability understand how Manhattan Active® Platform captures activities and changes that can be used for tracking, troubleshooting, forensics, and learning.

Learn More

Author

  • Kartik Pandya: Vice President, Manhattan Active® Platform, R&D.

3 - Auditability

Understand how Manhattan Active® Platform captures activities and changes that can be used for tracking, troubleshooting, forensics, and learning.

Introduction

Manhattan Active® Platform has built-in mechanisms to track and audit user activity, HTTP traffic, extension events, and changes to application configuration. Depending on the type of the tracking and auditing data, it can be accessed and viewed by the customer via the application UI, REST API, Pub/Sub integration or data replication. The sections below summarize the tracking and auditing mechanisms:

Activity Stream

Activity Stream is a module in Manhattan Active® Platform responsible for tracking and logging all system interactions that involve the applications deployed on the platform. These interactions include:

  • Inbound HTTP invocations: All HTTP calls received by Manhattan Active® Platform, including REST API invocations and traffic originating from the web UI or mobile apps.
  • User Authorization requests: All requests for accessing the user’s OAuth2 access token, or reading or updating user information such as roles, grants, organization structure etc.
  • Security configuration access: All requests for reading or updating security configuration such as IDP integration, Client IDs/secrets etc.
  • Extension handler invocations: All events pertaining to the custom extension such as synchronous call-outs (user exits) and extension points
  • Outbound HTTP invocations: All HTTP calls made by Manhattan Active® Platform, including REST API invocations made to customer’s host systems or custom 3rd party addresses

These system interactions, or “activities” are captured and published to a dedicated topic in Google Cloud Pub/Sub as messages with JSON payload in a standardized format. These messages can then be consumed via the Pub/Sub subscriptions. Customers can optionally choose to develop a Pub/Sub consumer using the authorized Service Account key delivered by Manhattan to store, index, and process the activity stream data in the system of their choice.

It should be noted that Manhattan Active® Platform does not track the user authentication when the authentication is carried out by an external identity provider such as Okta or Azure AD. The tracking and auditing events in Manhattan Active® Platform begins when it receives the request, or when it sends a request out to an external location.

The diagram shown below illustrates the basic design of Activity Stream, and the flow of the activities it captures:

Activity Stream data is used by Manhattan Reliability Engineering and ProdOps teams internally for monitoring, incident troubleshooting and change-detection purposes.

By default, the activities posted to Pub/Sub include only the message headers, and not the body of the message (meaning, the business data) due to potential privacy concerns that the customer may have. It is, however, possible to configure Activity Stream to capture the body of the messages if the customer desires to do so. If the capture configuration includes the message body, the storage and transmission costs may exceed the thresholds defined in the customer contracts, and the customer may be subjected to additional billing for the overages.

Contact Manhattan Sales or Professional Services for more information about implementing Activity Stream.

Audit Framework

Audit Framework is part of the Entity Framework library from the Manhattan Active® Platform. Audit Framework is instrumented in the application components as a runtime dependency and is responsible for capturing changes to configuration data on commit. Configuration data includes all entities that represent the metadata or settings of Manhattan Active® Platform and products that the customer may configure as part of the implementation process. While the full extent of Audit Framework functionality is out of the scope of this document, some examples of the configuration entities covered by the Audit Framework are listed below:

  • Authorization and access control configuration such as Users, Organizations, Roles, Grants
  • Business rules
  • Business configuration such as Locations, Business Units, Stores, Schedules
  • Payment configuration such as Payment Types, Parameters, Rules
  • Batch job configuration such as Recurrence, Parameters, Schedules
  • Custom extensions and message type configuration

Captured audit data (inserts, updates, and deletes) is indexed in Elasticsearch, and can be accessed via Audit user interface or REST API. More information about the Audit Framework can be found in the Manhattan Active® Platform product and platform documentation.

Audit data is used by Manhattan Reliability Engineering and ProdOps teams internally for monitoring, incident troubleshooting and change-detection purposes.

Change Detection

Manhattan has built a mechanism in Manhattan Active® Platform to be able to detect and capture changes that take place in the system. The Change Detection mechanism can detect several types of changes and store them in a structured way for future reference or, in some cases, for creating alerts in the system that can be investigated:

  • Changes to deployment metadata and infrastructure: Container tag updates, Code Drop or Release ID changes, modification of global environment variables, sudden changes in compute allocation, etc.
  • Changes to application’s properties and feature flags: Properties or feature flag configuration injected to the Spring Boot runtime and stack specific custom overrides
  • Changes to application’s business configuration: Configuration that is created, updated, or deleted using the application user interface or REST API
  • Changes to custom extensions & integration: Changes to custom extension or external integrations developed by the customer or Manhattan’s Professional Services staff
  • Code changes between the previous and current Code Drop deploys

The Change Detection system exposes the detected changes (what changed, when, and how) in near real-time via user interface to the Manhattan Active® Operations team. While some of this information is also visible via the application’s user interface, most of it is used by Manhattan staff as part of managing the customer’s environment, and maintaining its availability.

Learn More

Author

  • Kartik Pandya: Vice President, Manhattan Active® Platform, R&D.

4 - Enterprise Integration - Async vs Sync

Introduction

Manhattan Active® Platform is API-first, cloud-native, and microservices-based. All Manhattan Active® Solutions are born in the cloud, hosted on Manhattan Active Platform, and are highly available, elastic, scalable, secure, and resilient. For enterprise systems to integrate and communicate, Manhattan Active Platform provides both Synchronous and Asynchronous communication options. This document describes the integration landscape and compares options for enterprise integration with Manhattan Active® Solutions.

Integration Landscape

As depicted here, you may integrate with Manhattan Active Solutions using a variety of options. You do not need to use just one option, and can pick between them based on the use case, the nature and volume of the data being transmitted, and the amount of infrastructure you would like to manage. Some pros and cons are listed later in this document.

Integration Diagram

Note: Manhattan Integration Framework (MIF) is an optional offering. MIF is based on Software AG webMethods and is an enterprise integration tool that will help convert both inbound and outbound messages between Manhattan Active Solutions and any enterprise host system.

Synchronous REST Integration

All Manhattan Active® APIs follow the same calling mechanisms and uniformly communicate via JSON payloads. All APIs in Manhattan Active Solutions are accessed via HTTPS REST endpoints. So, an authenticated user can directly call any Manhattan Active API endpoint. With REST-based integration, feedback to the caller is immediate. However, the caller must have the necessary infrastructure to handle failures and scale calls to achieve the required throughput.

Asynchronous Queue Integration

Queue-based integration with Manhattan Active Solutions follows the same integration pattern for all interfaces. You build these asynchronous queue integrations using Google Pub/Sub and may choose from a wide range of integration options provided by Google that includes Pub/Sub APIs and a wide variety of client libraries. Once messages are posted into the dedicated Google Pub/Sub queue, Manhattan Active® Platform’s messaging infrastructure picks up and routes these messages to the target endpoints while supporting a full-featured message queue management system to manage the delivery of messages.

Note: Asynchronous Queue Integration delivers JSON messages to the same REST endpoints as Synchronous REST Integration with the added benefits of message queue management capabilities.

Integration Factors to Consider

This section provides guidance for choosing the appropriate approach for individual use cases by comparing Synchronous and Asynchronous capabilities against key integration requirements.

Scaling

Auto-scaling is the ability to automatically scale up or down to meet the fluctuations in demand for interface calls. This is a critical requirement for large-volume communication between external host systems and Manhattan Active Solutions.

Synchronous REST IntegrationAsynchronous Queue Integration
Manhattan Active Platform REST services are scaled up automatically to meet demand based on reasonable API call rates.Message processing rates (throughputs) are benchmarked and published for key interfaces and processes.
Downstream systems must size and scale the number of threads to achieve a specific, required throughput.Queue processors are scaled up and down to achieve required throughputs by dynamically adjusting to the API response times and queue depths.

With Asynchronous Queue Integration, Manhattan Active Platform dynamically scales up and down the queue processors to meet the demands of large message inflows based on the response times of the target Manhattan Active API endpoints. It does this to account for variance in the response times stemming from the non-uniform nature of inbound payloads (for example: the save order endpoint will generally process an order with 10 lines faster than an order with 250 lines).

With Synchronous REST Integration, downstream systems will need to handle dynamic scaling by closely monitoring API response times and scaling up/down the number of threads making API calls to achieve the desired throughput. Manhattan Active Platform will automatically scale REST services based on the call rates but cannot guarantee response times due to the variability of the inbound payload. Also, note that there are appropriate guard rails in place to throttle the maximum number of calls allowed per second per API. In general, the rate limits set are higher than most reasonable business needs. Rate limits are necessary to protect the environment and maintain the overall health of the environment.

Typically, downstream systems integrating synchronously will require more infrastructure as they will need to wait for the synchronous calls to respond from the upstream system.

We recommend that if high throughput is a criterion, Asynchronous Queue Integration will typically achieve better overall throughput and require less infrastructure and management in the downstream systems.

Error Handling

Error handling is the ability to capture and handle errors. Errors can occur for various reasons, and such instances need proper handling.

Synchronous REST IntegrationAsynchronous Queue Integration
Immediate error reporting via HTTP error codes and business validations.Detailed logs of infrastructure errors, data errors, and service errors.
Caller owns error handling, retries, and failure resolution.Automatic Retries: Messages are reposted after standard intervals.
Failed Messages: Failures are captured and persisted for direct reposting, as well as data corrections.

With Asynchronous Queue Integration, Manhattan Active Platform provides comprehensive error handling capabilities which allow automatic and manual resolution of errors. In the event of a message posting failure, Manhattan Active Platform automatically retries posting the message multiple times after a configurable interval. After exhausting the retry limit, the message is posted into a staged queue for remediation.

Customers have multiple options to monitor staged queues:

  • Monitoring dashboard to periodically check for any error build-up.
  • Purpose-built user interface to manage failed messages including capabilities to correct and repost data.
  • API endpoints to programmatically manage failed messages as above.
  • Capability to configure alerts based on customizable data conditions to get notified of any failures as they happen.
  • Detailed logs of failures.

With Synchronous REST Integration, it is recommended that the caller captures and corrects errors, stages failed payloads and reposts payloads to continue business operations.

When immediate feedback is not a requirement, Asynchronous Queue Integration will provide better error handling and retry mechanisms. If the downstream system already has sophisticated error handling and retry mechanics, both approaches may be appropriate.

Traceability

Traceability is the ability to capture logs of service calls and the ability to debug/troubleshoot messages.

Synchronous REST IntegrationAsynchronous Queue Integration
Limited to HTTP response codes and error responses.Detailed logs of payload and message processing status are available; can be queried on demand.
Caller must trace the errors and remediate them as appropriate.Manhattan Active Platform provides visibility into message routes, hops, and payloads to show exactly what happened to a message.

Like error handling above, traceability and audits of message inflow and status are your responsibility when Synchronous REST Integration is chosen. With Asynchronous Queue Integration, Manhattan Active Platform preserves detailed logs and payloads in addition to a full trace of errors encountered during message processing.

System Isolation

System isolation refers to the ability to independently operate one application regardless of the availability of another system to ensure systems don’t directly impact one another.

Synchronous REST IntegrationAsynchronous Queue Integration
Creates a tighter coupling between the upstream and downstream systems.Messages in queues are automatically retried when APIs are momentarily unavailable.
Caller is responsible for error handling and exceptions occurring due to the unavailability of Manhattan Active APIs.Failure management capabilities guarantee the eventual delivery of messages.

Acknowledgement

The ability to receive a response or an acknowledgment after calling an API or posting a message.

Synchronous REST IntegrationAsynchronous Queue Integration
Immediate1 with standard HTTP response codes.Fire and forget. Through technical configurations, Manhattan Active Platform can publish events2 as acknowledgments.

Overall

Our experience shows that Asynchronous Queue Integrations typically provide better system isolation, error handling, traceability, and scaling whereas Synchronous REST Integration is the appropriate choice when an immediate response is needed for a business requirement.


  1. Based on API response times. Some business API endpoints can take multiple seconds to respond. ↩︎

  2. Manhattan Active Platform events contain an entity body that is configurable. Events do not contain HTTP response codes. ↩︎

5 - Extensibility & Configurability

What all you can do to extend and configure the Manhattan Active® Platform and the solutions to make it work for your business requirements.

Introduction

Manhattan Active® Platform enables a comprehensive and flexible extensible model for customizing the business workflows, data models, user interface, and system behavior. Extensibility allows the Manhattan Professional Service team, and the customer to extend Manhattan Active® solutions without being impacted by (and without impacting) the base application upgrades or other maintenance activities carried out by the Manhattan Active® Operations. This document summarizes the set of extensibility methods available as part of the Manhattan Active® Platform:

Extensibility Methods

Manhattan Active® Platform supports a variety of extensibility methods as described below:

  • Custom Data Attributes is a mechanism in Manhattan Active® Platform for the customers to be able to introduce custom attributes as part of the base business objects. The custom data attributes can be added to virtually any base business objects. Once added, the custom attributes can be used just like the base data attributes: they can be used to create additional fields in the UI, be used for calculations in the business logic, be updated via the UI or through the REST API, or be part of the search queries, filters or business reports.

  • Extension Points: Manhattan Active® application components have built-in extension points for functional customizations. Call-outs to custom services can be configured at these extension points such that the custom services are invoked as part of a base workflow. The custom extension points are available throughout the base application functionality for a powerful extension model. Manhattan also evaluates the need for new extension points frequently and continues to insert them as part of application updates every few weeks. Extension points are designed to handle synchronous and asynchronous call-outs from the base flow, respectively.

  • User Interface Extensibility feature in Manhattan Active® Platform allows customization of the web (browser) and mobile app user interfaces by extending parts of the UI to introduce new data fields, to change the behavior of the UI functionality, or to add custom screens as part of existing UI workflows. Manhattan Active® Platform includes a built-in UI extensibility designer as part of ProActive tool-set that customers can use to extend the usability and look & feel of the UI without writing virtually any code.

  • REST API: Each Manhattan Active® Platform microservice publishes a REST API that handles virtually every business function and “CRUD” operations. Collectively, Manhattan Active® Platform exposes thousands of REST API endpoints which can be used as an instrument for integrating 3rd party systems with Manhattan Active® Platform. The REST API supports the same degree of access control that the Manhattan Active® Platform user interface provides. The REST API comes with Swagger documentation that outline the usage and references.

  • External Integration support in Manhattan Active® Platform allows customers to integrate the base application with external systems (3rd party or home-grown) in the customer’s IT landscape. The platform supports external integration for inbound and outbound traffic via HTTP (synchronous) and messaging (asynchronous) invocations. Inbound integration can be achieved either by invoking the platform REST API from an external system or by sending inbound messages via Google Cloud Pub/Sub. Likewise, Outbound integration can be instrumented by defining synchronous call-outs (user exits) to invoke external services via HTTP or by sending outbound events to external systems via Google Cloud Pub/Sub.

  • Business Configuration is a built-in mechanism in Manhattan Active® Platform to configure the hundreds of settings available to the customer to customize how the application behaves and operates. Business configurations can be done via the application’s user interface, or using the REST API exposed for the purpose. Examples of business configurations include:

    • Authorization and access control configuration such as Users, Organizations, Roles, Grants,
    • Business rules
    • Business configuration such as Locations, Business Units, Stores, Schedules
    • Payment configuration such as Payment Types, Parameters, Rules
    • Batch job configuration such as Recurrence, Parameters, Schedules
    • Message type configuration
  • Custom Config Store is a way to store custom configuration artifacts as part of the Manhattan Active® Platform such that they can be retrieved and referred to as part of other extensibility methods. The custom config store is also designed to handle encrypted artifacts including sensitive data such as 3rd party tokens or credentials, certificates, etc. that may need to be used for customizing the functional behavior of the application (for example, adding a new external integration to a payment system that requires an OAuth token for authorization). Data in the custom config store can be managed via its user interface, or by using its REST API. Data access of the custom config store is controlled by authorization and access control rules.

Learn More

Author

  • Kartik Pandya: Vice President, Manhattan Active® Platform, R&D.

6 - Asynchronous Workload Processing

Basic concepts of your asynchronous workload and different options available to process them consistently and with very high throughput requirements.

Introduction

The Asynchronous Workload Processing Framework (or “AWPF” for short) is a framework instrumented in every microservice that runs as part of the Manhattan Active® Platform. AWPF enables extensibility, messaging, and batch execution within the business flows, and provides an abstract way to handle asynchronous communication between the microservices, or to- and from- the microservices and external systems. There is also a mechanism for helping components implement a basic “pipeline” – a data-defined sequence of states and corresponding services. This document covers the concepts of extension points and handlers, message types (inbound and outbound), service definitions, intermediate queues and payloads.

AWPF Features

  • Simple configuration: Abstract out the configuration of queues through simple database entities
  • Multi broker support: Ability to connect to multiple message brokers like - RabbitMQ, Kafka, Google Pub/Sub, Amazon SQS
  • Connection and security: Manage the connectivity and security for asynchronous communication
  • Failed message handling: handle retries for the failed messages and capabilities to replay the failed messages with or without data correction
  • Zero message loss: Ensure zero message loss through multiple message persistence techniques
  • Consolidation and De-deduplication: Ability to consolidate similar messages together and de-duplicate messages
  • Scheduled delivery: Ability to schedule the delivery of a message at a particular point of time. Send now, get it delivered later
  • Conditional messaging: Ability to send or suppress a message based on a MVEL condition, evaluated against the message content
  • Transformation: Ability to transform message contents through freemarker and velocity templates
  • Metrics and dashboards: Provide visibility through runtime metrics and intuitive dashboards

AWPF Message Types

AWPF simplifies queue configuration for every microservice that defines one or more queues. From a microservice’s perspective, it can either send a message to a queue, or receive a message from a queue. Or in the asynchronous messaging speak, a microservice can either produce a message or consume it. A microservice defines the queues it sends messages to through the OutboundMessageType configuration. Similarly, it defines the queues it receives the messages from through the InboundMessageType configuration. OutboundMessageType and InboundMessageType are defined via configuration entities in the database. Other connection details such as the messaging service (i.e., the “broker”), host names, port and credentials are abstracted from the microservice code and are configured in the database.

At a high level, the message flow between microservices or via the external system integration can be described as the following:

There are different ways a message type is mapped to an actual queue:

  • The queue name directly maps to the message type name. This is the most commonly used pattern.
  • Inbound and Outbound message types support defining a specific queue name if it is different from the message type name
  • The queue name may carry a system property placeholder in it, which is evaluated by AWPF and replaces the placeholder to the configured property value at runtime
  • A queue can also be defined with an absolute queue name by marking it is fullyQualified.

For more detailed reference, see AWPF Message Types

Batch framework and Scheduler

Microservices may need to execute batch jobs for processing data in bulk. A batch job defines two properties: (1) what should the job do, and (2) when to execute the job. The Bath Framework enables these capabilities via configuration that a microservice can leverage to define the jobs and the execution cadence.

Batch framework features

  • Simple configuration: Abstraction of complext details of a job via simple configuration mechanism
  • Multiple varieties of jobs: Batch framework supports different types of jobs such as Spring-batch jobs, Service jobs, BatchV2 jobs and Agent jobs.
  • Status monitoring and completion callback: When a batch job executes asynchronously, this feature helps the user visualize the progress of the job execution by tracking the messages produced by the execution. Once a job is complete, the user can receive a callback with the the execution status through the callback handler.
  • Facility/BU level jobs: While the job configuration entities are defined at an organization-profile level, there is provision to configure Node/BU level jobs through job parameters.
  • Heel-to-toe executions: If you want your next job execution to commence only when the previous execution goes to a completion, you can use the heel-to-toe property for a job.
  • Timezone support: You can configure you job to run in a particular timezone. By default, all jobs execute in UTC timezone.
  • Ad hoc job trigger: A job does not need to always be scheduled. A job can be executed without defining a recurring schedule for it by using the ad hoc job trigger.
  • On-demand trigger: If there is a need to trigger an other scheduled recurring job out of turn, the on demand trigger can be used to to trigger the job, without disrupting the regular cadence of the job.

Refer to this page for the job configuration details

Scheduler features

  • One-stop-shop for job related queries: Scheduler maintains a copy of all job configurations across microservices. It also maintains a record of upcoming and past executions. The Scheduler microservice provides user interface and API to collect this information.
  • High availability: Scheduler ensures high availability by electing one of the REST stereotype instances as the primary. If the primary instance is unavailable for any reason, another instance automatically elects itself as the primary and continues triggering the jobs.
  • Useful REST endpoints: Scheduler provides a REST API to initialize the jobs, create future execution backlogs, etc.
  • Internal housekeeping jobs: Scheduler has a few internal housekeeping jobs to help it manage the job triggers more effectively.

Refer to this page for details about scheduler component

Learn More

Authors

  • Subhajit Maiti: Director, Manhattan Active® Platform, R&D.

7 - Data Stream

Introduction to Manhattan’s answer for your data replication and archiving requirements.

Introduction

Data Stream is a sub-system of Manhattan Active® Platform that is responsible to replicate data from the production MySQL database to a configurable set of target data stores. Production data is replicated by reading the MySQL binary logs and is synced with the target (near) real-time. Data Stream is built using open-source components and a framework developed by Manhattan called Gravina.

Change data capture, or CDC, is a well-established software design pattern for a system that monitors and captures the changes in data so that other software can respond to those changes. CDC captures row-level changes to database tables and passes corresponding change events to a CDC stream. Applications can read these change event streams and access these change events in the order in which they occurred. change data capture helps to bridge traditional data stores and new cloud native event-driven architectures.

Gravina Overview

Gravina is an internal system and framework library build by Manhattan that reads the source MySQL binary log, converts each event into a message in a CDC capturing mechanism (Kafka), and relays them to the target system in the format the target system can interpret.

Customers can implement Data Stream to replicate production data to a set of supported target systems, which as of the writing of this document include the following:

Manhattan Active® Platform engineering team continues to build support for additional replication targets based on product management and customer requirements.

Gravina Architecture

Following is the summary of the Gravina components that enable the Data Stream functionality:

Extractor

Gravina Extractor is a Spring Boot microservice that embeds Shyiko mysql-binlog-connector-java and integrates with Apache Kafka. Extractor reads binary logs from the source MySQL database, converts each event into a message, and publishes them to Kafka. The extraction process executes as a single-threaded background job.

Kafka

Apache Kafka plays the role of CDC capture system in Gravina architecture. It is used as to stream events form the extractor into the target consumer - one of the supported Gravina replicator components. The messages in Kafka are partitioned by database entity groups into separate Kafka topics so that they can be consumed concurrently by the replicators.

Replicator

Gravina Replicator is a Spring Boot microservice that subscribes to the Kafka topics mentioned above to read the CDC events. These events are then converted to a payload format suitable for the target system. Each replicator implementation is specific to the target it serves (for example, replicator implementations for Google Cloud Pub/Sub target and a MySQL replica on customer’s Google Cloud SQL endpoint). As shown in the diagram above, to increase throughput, multiple instances of the replicator can be run to process CDC events from Kafka topics in parallel. Likewise, multiple types of Gravina replicators can also coexist in the same environment to transmit the replication events to different types of target systems concurrently.

Supported Replication Modes (Generally Available)

While Gravina can technically replicate the data stream to a wide range of target stores, presently Manhattan supports the following replication modes as generally available options:

Data Save with Google Cloud SQL

Production data is replicated to a Google Cloud SQL instance owned and managed by Manhattan. The customer has private access to the database instance, and a read only authorization to query and report from this database. Manhattan remains responsible to operate, monitor, and maintain the database instance.

Data Stream with Google Cloud Pub/Sub

Production data is streamed as CDC events to a Google Cloud Pub/Sub endpoint owned and managed by the customer. Manhattan will need authorization and network access to post events to this Pub/Sub endpoint. Customer remains responsible to operate, monitor, and maintain the Pub/Sub endpoint.

Data Stream with Google Cloud SQL

Production data is replicated to a Google Cloud SQL instance owned and managed by the customer. Manhattan will need authorization and network access to write the replication events to this database instance. Customer remains responsible to operate, monitor, and maintain the database instance.

Replication Modes in Preview (BETA)

The replication modes described below are in preview for select set of implementations and are considered available as beta:

Data Save with Google Cloud Pub/Sub

Production data is streamed as CDC events to a Google Cloud Pub/Sub endpoint owned and managed by Manhattan. The customer has private access to the Pub/Sub endpoint, and a read only authorization to consume the messages from the topic subscription. Manhattan remains responsible to operate, monitor, and maintain the Pub/Sub endpoint.

Learn More

Author

  • Kartik Pandya: Vice President, Manhattan Active® Platform, R&D.