Network Connectivity Issues

September 18, 2023 at 6:00 PM
Resolved after about 12 hours


Partial outage
Ledgers API
Compliance API
  • Resolved

    AWS experienced network latencies and errors which started around 10:20am PT that were caused by resource contention within the subsystem responsible for propagation of network mappings within the Amazon Virtual Private Cloud.

    This affected the elastic nature of our services as we typically scale-in and scale-out API resources due to demand. Early creation and movement of our services, in an attempt to restore operation, introduced latency and manifested in errors when attempting to perform health checks and connectivity timeouts with other supporting services.

    By around 9:15pm PT, we were no longer experiencing any resource contention and our systems operated normally.

  • Monitoring

    We continue to monitor the health of our systems. The error rates have fallen significantly and performance has stabilized. AWS is beginning to operate normally.

    We will provide a final update when all has been confirmed.

  • Monitoring

    AWS continues to work on their network mapping propagation. Our current plan is to monitor and investigate any additional mitigations in conjunction with the AWS mitigation phases.

    We expect partial outage on all Modern Treasury endpoints with elevated rates of 504s.

    We will continue to provide updates every 90 minutes, or as we have additional information to share.

    The Modern Treasury API offers idempotent requests to prevent accidental duplication of API calls. This feature is particularly useful when initiating actions such as money transfers, entity creation, or resource modifications. For instance, if a gateway timeouts (504) occur while creating a Payment Order, you can safely retry the request using the same idempotency key to ensure that only one payment order is created.

  • Monitoring

    We continue to monitor the health of our services. The error rates have continued to fall and have stabilized at a very low rate.

    We expect to be back at healthy operation levels in a few hours, after AWS has completed their updates to address the network connectivity issues and errors affecting the Availability Zones in the us-west-2 region.

    You can see their AWS's status updates here:

  • Identified

    We are continuing to work on mitigations. From our monitoring, our error rate has been steadily going down. From the beginning of the incident until now, our error rate has averaged around 1%, and our worst minute window hit a max of 7.2% error rate.

    AWS is actively working on fixing the us-west-2 Availability Zone issues, you can see their status update here:

  • Identified

    AWS is experiencing network latencies and errors in multiple availability zones in the us-west-2 region.

    We currently have limited impact due to our high availability network layout within the us-west-2 region.

    We are currently looking into additional mitigations to further limit impact.