Resolving JFrog OCI Upgrade Migration Failure

Proven Strategies for Seamless Database Migrations

 

 

 

 

Was an OCI Load Balancer Causing the Migration Timeout?

Through tracing of data flows across our architecture, we discovered that an Oracle Cloud Infrastructure (OCI) Load Balancer—situated between Xray and the database—was timing out.

One of the database migration steps was taking longer than the load balancer’s configured idle timeout.

Once that threshold was hit, the connection would close. Xray, interpreting this as a database failure, would halt the process entirely and never retry. Restarting the entire migration was the only option.

What Went Wrong During Our JFrog Xray Upgrade in Oracle Cloud?

Earlier this year, what was supposed to be a straightforward upgrade in our DEV environment turned into a complex troubleshooting effort.

After upgrading JFrog Xray to its latest version, the service failed to start. An internal database migration process was introduced in this release—and it kept failing.

Despite running Xray reliably for years, this new version introduced issues we had never seen before. And so began an in-depth diagnostic journey that spanned multiple layers of the stack—from Oracle Cloud load balancers to Kubernetes resource tuning.

Why Did the JFrog Xray Database Migration Keep Failing?

The migration process would consistently hang—first at 50%, then at 56%—despite various configuration changes.

Logs showed the failure happening at the same point during each attempt, yet nothing in our core configuration had changed other than the Xray version.

We opened a ticket and collaborated closely with JFrog support during the initial investigation. While the root cause wasn’t immediately clear, both teams worked together to explore potential triggers.

As the issue persisted, we decided to take a deeper look under the hood to trace the behavior across our infrastructure and uncover hidden constraints.

What Underlying Infrastructure Issues Affected the Xray Upgrade?

We found other contributing factors as well:

 

  • The Kubernetes pod running Xray had insufficient resource requests, slowing down operations.
  • The migration required downloading a vulnerability database several hundred gigabytes in size, something we hadn’t encountered before.
  • The load balancer in front of Xray wasn’t provisioned for high throughput data transfers—essentially a narrow pipe for a high volume download.
  • The storage layer backing the Xray database had lower-tier performance, affecting insert and index-rebuild operations during the migration.

How Did We Tune Our OCI Infrastructure to Fix the Problem?

To overcome these challenges, we implemented several key changes:

 

  • Increased timeout values across internal and external load balancers and proxies.
  • Upgraded storage performance tiers, improving I/O throughput for database operations.
  • Adjusted Kubernetes pod resource allocations to better support compute and memory needs, similar to lessons from our custom NIfi OCI processor project.
  • Expanded load balancer bandwidth to accommodate large data downloads quickly and reliably.
  • These changes shaved 10–15 minutes off the overall migration time
    and—more importantly—allowed it to complete successfully.​The Kubernetes pod running Xray had ins

Did the Fix Work and What Did It Reveal About Running in OCI?

The upgrade has remained stable since the changes were applied, and the service continues to run reliably in production.

As part of our ongoing responsibilities, we regularly review performance metrics and monitor the environment to catch any signs of degradation early. Ensuring optimal performance isn’t a
one-time task—it’s a continuous effort.

This experience highlights the importance of environment-aware tuning—from front-end load balancers all the way to storage IOPS. It also reinforces the value of trusted partnerships.

In this case, our customer initiated the upgrade, then escalated to us when issues arose. Working together, we delivered a robust solution.​

How Can You Protect Mission-Critical Services During Cloud Upgrade?

Upgrades aren’t always predictable. As cloud environments become more layered and interconnected, new versions of software may introduce hidden demands on infrastructure that aren’t documented.

For our team, this case was a powerful reminder: successful upgrades require not just technical skill, but deep situational awareness across the full stack. That’s what allows us to consistently deliver outcomes—even when the path is uncertain.

Further Reading:

  • Why Oracle Cloud Infrastructure (OCI): Discover why we chose Oracle Cloud Infrastructure to meet the security, performance, and compliance needs of federal customers.
  • Storage performance tiers in OCI impacted operations, which is critical for federal environments. See how 2i tackles another case Secure Terraform Automation in OCI for Federal and DoD Teams.

Modernize Your Federal Infrastructure with 2i and OCI

Need to modernize legacy systems, meet evolving cybersecurity mandates or modernize Oracle workloads in the cloud? Contact Ikeda Innovations to learn how Oracle Cloud Infrastructure (OCI) and 2i’s federal cloud engineering expertise can help your agency achieve mission success—securely, cost-effectively, and at scale.