Instructure Tech Blog

Torpedoes of nerdy truth from the engineering team at Instructure

Continuous Integration

| Comments

Parallelization, Selenium, AWS, and a CI server are common implementations for agile shops, and we are no exception. Since I’ve been here, maintenance hasn’t been a nightmare, nor has it been a dream. We’ve learned a few things to streamline our maintenance process and would love to share the things that made our lives easier.

When looking at what was eating up our time, we found the differences between our CI environment and average developer’s environment was at the forefront. Most of us at Instructure use Macs, a few of us Windows, and there are a handful Linux users. Among us, browser choice varies, but the most popular is Chrome. Engineers develop locally using their own OS, browser type/version and the Ruby Selenium driver to power tests. Listed below are a few of the differences we found:

  1. AWS images were Ubuntu
  2. Variance in browser versions
  3. Selenium grid uses Java stand alone server VS local specs running using native Ruby driver
  4. Variance in native event capabilities VS Jenkins running exclusively using native events
  5. Parallelization in CI vs single threaded specs in developer’s environment

We began to strive for consistency between the average dev environment and the CI environments, where we could, without sacraficing production like implementation. This helped with mitigating many intermittency issues. During this process, it simplified the CI upgrade process while edging maintenance towards dreamland and further from Elm St.

So, what did we do to make this happen? First, all logic for Selenium driver code was changed from Selenium grid\Java stand-alone server to the native Ruby Selenium driver. This change immediately resulted in noticeable stability improvements, and easier resolution for intermittency creep from tests.

In the past, we had challenges reproducing errors, locally, that our CI server would throw because of the variance in environments. We spent a lot of time debugging these errors, and had to create CI environments that devs could SSH into, to debug the spec failures. This was highly inefficient, and not consistent with the nature or true purpose of CI. Making the two environments consistent gave us more time to work on other things, instead of time-consuming and meaningless failures that were challenging and intensive to debug. Now we are easily able to (for the most part) reproduce all errors produced by our CI server.

Along the way, we also found Selenium grid was not the ideal solution for us. The main reason being the variance in spec outcomes between dev environments and CI, resulting in too much overhead for maintenance and upgrades, while also sapping dev time on CI spec debugging. Many shops use a hardware infrastructure that is distributed across multiple boxes and the tests are farmed out to them; this is ideal for Selenium grid. We, however, prefer one big box with parallelization done internally, reducing the need for a Java stand-alone server and Selenium grid. Capabilities for the drivers are consumed from config files written at runtime for different browsers and we have no need to push each test to a Selenium hub to be farmed out to a worker. We found this complicated setup offered little reward and brought on noticeable burden in maintenance and stability while introducing a variance in spec outcomes.

Making these changes made our lives noticeably easier. When managing or implementing a CI environment, we believe it’s prudent to not only consider your core tech architecture, but the relationship with your developer’s workflow. Although complex implementations of Selenium grid and multiple browsers can be beneficial, it may add more complications than are worth it. We recommend a lean implementation of Selenium, using the driver of the native application language for tighter integration and better maintenance with more consistency between test runs.

Comments