Consuming content from multiple CMSs with Gatsby

January 17, 2020

Owing to the emergence of the JAMstack, it is increasingly common knowledge that Gatsby (based on React) and other JAMstack technologies like Gridsome for Vue.js and the just-announced Scully for Angular provide a variety of competitive advantages when it comes to performance, resourcing, cost reduction, and security. In addition, Gatsby and others support an architectural paradigm I call the distributed CMS that enables development best practices like decoupled updates, pipelined development, interchangeable presentation layers and in-page services, and API convergence. As I wrote in my previous blog post, these and other evolutions are redefining the separation of concerns in content management systems (CMS).

But while Gatsby’s benefits when it comes to developer experience and front-end performance have seen much spilled ink, its robustness when it comes to content governance and CMS flexibility have seen comparatively less attention. Gatsby can support multiple data sources and CMSs at the same time, and while many blog posts have focused on the merger of content and commerce systems in the JAMstack, leveraging more than just one CMS is a feature unique to static JAMstack sites for a variety of reasons. In this blog post, I share my own experience building my personal site, which consumes data from Drupal, Contentful, and Oracle Content and Experience (OCE), in order to motivate a multi-CMS architecture for your Gatsby or other JAMstack site.

Why consuming multiple CMSs used to be unrealistic

In the past, the notion of supporting a web front end with multiple CMSs was anathema to engineers for reasons of both performance and developer experience. First, relying on requests to multiple CMSs to provide the data necessary to the application inevitably led to considerable latency, as the retrieval of this data required multiple successive requests that engendered delays in both server-side and client-side rendering. Second, the idea of traversing differing data structures on discrete REST APIs with their own specifications is a nightmarish prospect for developers who just want to get on with it and render content into their applications as quickly as possible.

Luckily, the JAMstack and static site generators mitigate both of these issues almost entirely. First, because JAMstack technologies issue all API requests at build time, the end user never incurs the latency of successive API calls; only the build or deployment itself experiences a delay. Second, thanks to an additional API layer being available in the form of GraphQL in technologies like Gatsby and Gridsome, the many distinctions between API specifications are largely nullified.

As you can see, many of the issues that were historically showstoppers are mitigated in this context. The Drupal community, for instance, created a variety of solutions to these problems on their own, including Mateu Aguiló Bosch’s Subrequests module, which I cover in detail in my book Decoupled Drupal in Practice (Apress, 2018) and allows for API requests with dependencies on future API responses to include placeholders in the initial request that can be replaced later. This is particularly useful when two pieces of related content need to be created, such as a tag (taxonomy term) and an article referring to that tag.

Why pull content from multiple CMSs?

Another relevant question for this architectural approach is why multiple CMSs are necessary in the first place. After all, don’t CMSs add additional points of failure and more complexity, as well as higher maintenance costs, to a content architecture? This is admittedly true, but there are certain advantages that leveraging multiple CMSs provide, besides the rarely mentioned advantage that spreading data across a variety of servers enables that data to be better protected. If an attacker accesses one CMS, they are certainly not guaranteed access to the others in a multi-CMS architecture. See my recent keynote at SecOS Day Sofia for more information about how the JAMstack improves security outcomes.

Perhaps more immediately relevant is the notion that pulling data from multiple CMSs allows you to leverage disparate content on a site with a single design straddling said content, meaning that visitors are unlikely to notice the differences between individual pages with their respective CMSs of choice.

Accessing differentiated features and ecosystems

However, in my opinion, the most important motivation to employ multiple CMSs for a single Gatsby or other JAMstack site is access to a larger feature set more specialized for your requirements. For example, Oracle Content and Experience (OCE) and Drupal both have rich content modeling capabilities that allow for arbitrary content types and fields to be combined with one another, facilitating the creation of relational content adhering to a highly customized data structure.

Another distinguishing feature of headless CMSs like Oracle’s OCE and Contentful is robust rich text handling. Both Contentful and OCE offer solutions that allow developers to consume rich text gracefully by traversing an abstract syntax tree (AST) generated by the rich text structures. This provides a compelling alternative to solely consuming plain text, and I leverage rich text and embedded images extensively in my blog. In addition to rich text, other features in the API, such as the availability of internationalization and localization or the ability to negotiate between different serialization formats (e.g. JSON, XML, and even HAL+JSON and CSV in Drupal), are key to the multi-CMS developer experience for building CMS-dependent applications.

One final consideration, apart from the differences in feature sets, surrounds the ecosystem and community around the CMS itself. While a secondary concern in relation to available functionality, the ability to rely on an active maintainer community and support forum can be critical for the success of developers building applications against multiple CMSs. Consider, for instance, that Drupal has a highly engaged API-first initiative community on the #contenta channel in Drupal Slack. As another example, Oracle’s OCE now has a Spectrum community where developers needing support can have their questions answered. Oracle’s OCE, Contentful, and Prismic, meanwhile, boast substantial software development kit (SDK) ecosystems, as does Drupal in the form of the Contenta ecosystem and the sadly now-deprecated Waterwheel suite of SDKs and reference builds.

Why Gatsby is optimal for consuming multiple CMSs

Gatsby is optimal for consuming multiple CMSs due to its focus on abstracting away the many distinctions between their approaches and APIs. The most visible layer in Gatsby where this can be seen is the GraphQL API, the query language and unified entry point for all data consumption in this JAMstack framework. This means that irrespective of where your content is administered, Gatsby provides several GraphQL fields that are associated with each CMS. Though each CMS’s source plugin follows a slightly different nomenclature, they are all top-level fields available just within the root object.

With Gatsby, moreover, the flexibility surrounding the use of multiple CMSs truly shines. Firstly, because there is no latency experienced by the end user due to delays in API responses, developers can perform data postprocessing instead of relying on preprocessing on the server. For example, consider a scenario in which a list of content items from the CMS needs to be sequenced in descending order by the CMS’s date field. Rather than this sort operation occurring on the CMS—potentially because another consumer is already depending on the sorting provided by default on the API resource—this sort can take place within the GraphQL through a sort argument that is only applied when rendering occurs at build time. This means that APIs offering a particular order for content to serve other consumers can allow those sequences to be easily manipulated in Gatsby, all with no latency experienced by the end user in the browser.

Second, because Gatsby pages are rendered at build time, there is no upper limit on the number of CMSs you can consume in a single page, since the deployment, not the end user, is what will incur the latency from the use of additional CMSs. Within Gatsby’s GraphQL API, you can use GraphQL aliases to make multiple requests to the same API, as you can see in this multilingual Contentful example, which invokes the same

allContentfulProduct

field twice by aliasing both field instances (to aliases

us

and

german

for localization). In addition, you can make API calls arbitrarily to data provided by two different CMSs without the need for aliases, allowing Contentful and Oracle OCE data to coexist on a single page with little overhead, for example.

How to consume multiple CMSs in Gatsby

Consuming content from multiple CMSs with Gatsby: Site architecture

Above is a diagram that outlines the architecture of my personal site, preston.so (view on GitHub). As you can see, I’m using Contentful for my blog articles, along with Contentful’s rich-text libraries, which translates the contents of Contentful’s rich text field into a traversable JSON structure. For my speaking history, I leveraged Drupal’s custom content modeling to quickly provide a content type and a set of fields for each speaking engagement. I did the same with Oracle’s OCE for my list of press appearances using OCE’s rich functionality for custom content modeling.

The means by which Gatsby retrieves data from each respective CMS and makes it available through the GraphQL API is the source plugin, an add-on to Gatsby sites that issue API requests on behalf of the developer. Gatsby’s tutorial does an excellent job explaining how source plugins work and how to use them in your site.

To play around with my site as it stands now, you can clone my GitHub repository, install dependencies, and initialize a local server by executing these commands:

# Install Gatsby’s command-line interface if you don’t have it already.
$ npm install -g gatsby-cli

# Clone repository and start site.
$ git clone https://github.com/prestonso/preston-so.git
$ cd preston-so/
$ gatsby develop

Installing multiple source plugins

Because Gatsby’s source plugins can be arbitrarily added in any quantity, we can simply add all of our needed source plugins in quick succession. You can use the default Gatsby starter (starters are ready-made Gatsby sites that kickstart development) and execute these commands within the root directory if you’re starting from scratch. These

npm install

commands save the source plugins as dependencies in

package.json

$ npm install --save gatsby-source-drupal gatsby-source-contentful

If you are using Oracle’s OCE, the source plugin is not currently available as part of the source plugins in the Gatsby monorepo; instead, it is part of Dolf Dijkstra’s Gatsby starter for OCE,

cec-starter-default

. In my repository, I take the same approach as Dolf in directly providing the Gatsby plugin within my codebase and placing it in a

plugins

folder at the root directory.

Once the source plugins are installed, we need to add them to our

gatsby-config.js

file, familiar to Gatsby developers, and configure them individually according to the expected options each source plugin expects. To add a source plugin, place it inside the

plugins

array, as you can see in the example below. Each source plugin will require an

options

object, which provides configuration such as the site serving as your content repository.

// gatsby-config.js
module.exports = {
  siteMetadata: {
    title: `preston.so`,
    description: `Product strategist, innovation lead, developer advocate`,
    author: `@prestonso`,
  },
  plugins: [
    {
      resolve: `gatsby-source-drupal`,
      options: {
        // Configuration goes here
      },
    },
  ],
}

Configuring gatsby-source-drupal

The Gatsby source plugin for Drupal,

gatsby-source-drupal

, requires a relatively small

options

object. In my own

gatsby-config.js

, I’ve provided a

baseUrl

in the form of my Drupal site hosted on Pantheon:

// gatsby-config.js
module.exports = {
  // siteMetadata
  plugins: [
    {
      resolve: `gatsby-source-drupal`,
      options: {
        baseUrl: `https://live-prestonso.pantheonsite.io/`,
      },
    },
  ],
}

If you are using a customized URL to enable situations like API versioning, as JSON:API in Drupal allows, or differentiated APIs on the same root URL, you can provide an additional

apiBase

option to replace the

jsonapi

default base URL:

// gatsby-config.js
{
  resolve: `gatsby-source-drupal`,
  options: {
    baseUrl: `https://live-prestonso.pantheonsite.io/`,
    apiBase: `api`, // Defaults to `jsonapi`
  },
},

Configuring gatsby-source-contentful

For Contentful, we need to use an approach that allows us to provide crucial information such as the

spaceId

and

accessToken

, both of which constitute privileged information that should only be leveraged in the form of environment variables. Environment variables allow us to hide sensitive credentials from public repositories on GitHub and other source control providers. We can store environment variables either on our local development environment by means of a

.env

file or on our hosting provider (I use Netlify) through the platform’s user interface for environment variables.

I use Contentful’s recommended approach for providing Contentful credentials, namely by using a

.contentful.json

file in your local machine and environment variables inputted into your hosting service. First, we set the

contentfulConfig

variable to retrieve the

spaceId

and

accessToken

through environment variables and then

.contentful.json

locally as a fallback:

// gatsby-config.js
let contentfulConfig
try {
  // Load the Contentful config from the .contentful.json
  contentfulConfig = require('./.contentful')
} catch (_) {}
// Overwrite the Contentful config with environment variables if they exist
contentfulConfig = {
  spaceId: process.env.CONTENTFUL_SPACE_ID || contentfulConfig.spaceId,
  accessToken: process.env.CONTENTFUL_DELIVERY_TOKEN || contentfulConfig.accessToken,
}
const { spaceId, accessToken } = contentfulConfig
if (!spaceId || !accessToken) {
  throw new Error(
    'Contentful spaceId and the delivery token need to be provided.'
  )
}

Then, we need to pass this configuration object to Gatsby as the

options

object in

gatsby-config.js

// gatsby-config.js
{
  resolve: `gatsby-source-contentful`,
  options: contentfulConfig,
},

Configuring gatsby-source-cec

Finally, for Oracle Content and Experience (OCE), after placing our

gatsby-source-cec

plugin into the

plugins

folder at the root of our codebase, we need to provide a

contentServer

value (in this case my OCE instance) as the base URL of our OCE site.

We also require a

channelToken

that can be set and refreshed in OCE’s administrative interface and should be converted into an environment variable (as we will do with

.env

shortly) to prevent security issues like distributed denial-of-service (DDoS) attacks, though this is already mitigated by cross-origin resource sharing (CORS) configuration that limits API access to your site’s domain:

// gatsby-config.js
{
  resolve: `gatsby-source-cec`,
  options: {
    contentServer: `https://prestonso-prestonso.cec.ocp.oraclecloud.com/`,
    channelToken: process.env.OCE_ACCESS_TOKEN,
    fromCache: false,
  },
},

Earlier in the file, I’ve also invoked

require()

to ensure that

.env

files are handled appropriately with

dotenv

// gatsby-config.js
require('dotenv').config({
  path: `.env.${process.env.NODE_ENV}`,
})

Finally, in our

.env.development

file, which represents our local development environment, we can insert the necessary credentials, including the

OCE_ACCESS_TOKEN

, which I’ve obfuscated in this example:

// .env.development
OCE_ACCESS_TOKEN=1a2b3c4d5e

Working with content from multiple CMSs

Now that we have our Gatsby site configured appropriately with the required source plugins, base URLs, and credentials, we can consume the content now housed within the GraphQL API. For developers less familiar with Gatsby, the framework makes a handy query debugging interface, GraphiQL, available at

https://localhost:8000/___graphql

when you run the

gatsby develop

command, which spins up a local development server with hot reloading. With GraphiQL, you can test arbitrary queries and see the response appear on the right side of your screen.

The manner in which Gatsby’s GraphQL API articulates each CMS’s data is predicated on how each respective CMS provides those data structures once you have burrowed deeper than a top-level field like

allNodeTalk

(Drupal: the bundle name followed by the content type name) or

allContentfulArticle

Gatsby has the concept of a page that is analogous to routes in other JavaScript frameworks. Each page can issue a query against Gatsby’s GraphQL API to provide the data used for the rendering code above. This same-file colocation of GraphQL data requirements and declarative rendering with JSX is one of the most commonly touted features of Gatsby and React at large, because it enables developers to see both the data requirements for a component and its use in rendering within a single file.

Handling Drupal content

src/pages/speaking.js

, I’ve created a page for my speaking engagements that consumes a “Talk” content type in Drupal. The following query, which makes use of the

allNodeTalk

top-level field, retrieves content items of the type Talk from data downloaded from Drupal:

// src/pages/speaking.js
query {
  allNodeTalk(
    sort: {
      fields: field_date
      order: DESC
    }
  ) {
    edges {
      node {
        id
        date:field_date
      link:field_link {
        uri
      }
      event:field_event
      video:field_video {
        uri
      }
      slides:field_slides {
        uri
      }
      category:field_category
      location:field_location
      with:field_with
      title
    }
  }
}

This query accomplishes two things, both of which are important from a developer standpoint. First, I perform a sort operation on the data populated from Drupal after the query has returned a response. Normally, in a non-JAMstack setting, implementing a sort operation in the client rather than the API would be inadvisable for performance reasons. However, in this case, because the end user never experiences the delay a sort operation would incur, we can leave this operation here rather than returning to the CMS to adjust the API resource or to create a separate listing of content items for consumption.

Second, Drupal prefixes every field value in its JSON:API implementation with

field_

, which is not only a telltale sign that we are consuming content from Drupal but also relevant to how Drupal’s theme layer functions. However, this

field_

prefix can be an eyesore for developers who prefer to keep their casing consistent in their rendering code, and it is also an example of a leaky abstraction of Drupal that unnecessarily intrudes into the front end. As such, I’ve aliased the fields so that they can be leveraged in a more developer-friendly way. This example also demonstrates another leaky abstraction of Drupal’s—the fact that link fields must be burrowed into in order to access the

uri

field contained therein (this is because Drupal requires that every link field include both a

uri

and

title

value).

Handling Oracle OCE content

Meanwhile, in my page that uses content from Oracle Content and Experience (OCE),

src/pages/press.js

, the top-level field is

allAppearance

, matching the Appearance content type that I created in my OCE instance. This query also contains a sort operation that occurs just in time for the render process, as well as an argument to reformat a timestamp string into a human-readable date.

// src/pages/press.js
query {
  allAppearance(
    sort: {
      fields: date
      order: DESC
    }
  ) {
    edges {
      node {
        id
        title
        publication
        author
        date(formatString: "MMMM DD, YYYY")
        link
      }
    }
  }
}

This simplicity of this query matches the simplicity of the content type, with no fields more complex than short text present, and in addition, no aliasing is necessary.

Handling Contentful content

As for Contentful, I use the

allContentfulArticle

top-level field in my query in

src/pages/writing.js

. In addition, however, because I’m utilizing Contentful’s rich text field rather than a plain text field as my body field, I need to drill into the field and access the

json

field contained therein, which yields a JSON-formatted tree containing my rich text and adhering to the aforementioned AST that Contentful provides.

// src/pages/writing.js
query {
  allContentfulArticle(
    sort: {
      fields: [originalPublicationDate]
      order: DESC
    }
  ) {
    edges {
      node {
        title
        slug
        originalPublicationDate(formatString: "MMMM D, YYYY")
        body {
          json
        }
      }
    }
  }
}

My articles are each utilizing a template,

src/templates/article.js

, which is used by Gatsby to render every article’s individual page. As an aside, I’m also using an article teaser component,

src/components/article-teaser.js

, to handle each article teaser—consisting of title and date—on the blog itself. I am also implementing Gatsby’s

createPages

API in

gatsby-node.js

to programmatically generate each individual article page, as you can see below:

// gatsby-node.js
exports.createPages = ({ graphql, actions }) => {
  const { createPage } = actions
  return new Promise((resolve, reject) => {
    const article = path.resolve('./src/templates/article.js')
    resolve(
      graphql(
        `
          {
            allContentfulArticle {
              edges {
                node {
                  title
                  slug
                }
              }
            }
          }
        `
      ).then(result => {
        if (result.errors) {
          console.log(result.errors)
          reject(result.errors)
        }
        const posts = result.data.allContentfulArticle.edges
        posts.forEach((post, index) => {
          createPage({
            path: `/writing/${post.node.slug}/`,
            component: article,
            context: {
              slug: post.node.slug
            },
          })
        })
      })
    )
  })
}

Handling rich text content from Contentful

Because we are leveraging Contentful’s rich text field, we need to employ Contentful’s

@contentful/rich-text-types

and

@contentful/rich-text-react-renderer

libraries to allow for the rich text AST in JSON, which could feasibly be funneled into any conceivable format besides HTML, to be converted into a format compatible with JSX in Gatsby.

First, we need to retrieve our data using Gatsby’s GraphQL API:

// src/templates/article.js
export const pageQuery = graphql`
  query ArticleBySlug($slug: String!) {
    contentfulArticle(slug: { eq: $slug }) {
      title
      slug
      originalPublicationDate(formatString: "MMMM D, YYYY")
      body {
        json
      }
    }
  }
`

Second, we need to import our Contentful rich text libraries:

// src/templates/article.js
import { MARKS, BLOCKS } from '@contentful/rich-text-types'
import { documentToReactComponents } from '@contentful/rich-text-react-renderer'

Third, we need to define how elements in Contentful’s rich text will be converted into JSX for rendering. Note that the following example can handle not only embedded images but also other embedded assets:

// src/templates/article.js
const Bold = ({ children }) => <strong>{children}</strong>
const Italic = ({ children }) => <em>{children}</em>
const Underline = ({ children }) => <span className="underline">{children}</span>
const Code = ({ children }) => <pre><code>{children}</code></pre>
const Heading1 = ({ children }) => <h2>{children}</h2>
const Heading2 = ({ children }) => <h3>{children}</h3>
const options = {
  renderMark: {
    [MARKS.BOLD]: text => <Bold>{text}</Bold>,
    [MARKS.ITALIC]: text => <Italic>{text}</Italic>,
    [MARKS.UNDERLINE]: text => <Underline>{text}</Underline>,
    [MARKS.CODE]: text => <Code>{text}</Code>,
  },
  renderNode: {
    [BLOCKS.HEADING_1]: (node, children) => <Heading1>{children}</Heading1>,
    [BLOCKS.HEADING_2]: (node, children) => <Heading2>{children}</Heading2>,
    [BLOCKS.EMBEDDED_ASSET]: (node) => {
      const { title, description, file } = node.data.target.fields
      const mimeType = file['en-US'].contentType
      const mimeGroup = mimeType.split('/')[0]
      switch (mimeGroup) {
        case 'image':
          return <img
            title={ title ? title['en-US'] : null }
            alt={ description ?  description['en-US'] : null }
            src={ file['en-US'].url }
          />
        case 'application':
          return <a
            alt={ description ?  description['en-US'] : null }
            href={ file['en-US'].url }
            >{ title ? title['en-US'] : file['en-US'].details.fileName }
          </a>
        default:
          return <span style={{backgroundColor: 'red', color: 'white'}}>{mimeType} embedded asset</span>
      }
    },
  },
}

Finally, we need to render the article JSON from Contentful (

article.body.json

) using the

documentToReactComponents()

method, which will perform the transformations we defined earlier:

// src/templates/article.js
class ArticleTemplate extends React.Component {
  render() {
    const article = get(this.props, 'data.contentfulArticle')
    return (
      <Layout>
        <SEO title={article.title} keywords={[`blog`, `articles`, `writing`, `tutorials`]} />
        <Card
          type="intro"
          orientation="no"
          title={article.title}
          body={article.originalPublicationDate}
        />
        <Card
          type="main"
          orientation="no"
          body={documentToReactComponents(article.body.json, options)}
        />
      </Layout>
    )
  }
}

To see all of this in action locally with working code, see my GitHub repository.

Conclusion

Consuming content from multiple content management systems may have been an inadvisable proposition for performance reasons only a few years ago, but thanks to the advent of the static site generator and especially the revolution represented by JAMstack technologies like Gatsby, leveraging a multifaceted content architecture with multiple backing CMSs is now an increasingly common reality. Because JAMstack frameworks issue requests at build time, the end user never experiences delays due to API latency.

Throughout this blog post, we explored not only the high-level benefits and rationales of employing multiple CMSs in a JAMstack architecture but also the nitty-gritty of how to implement a multi-CMS architecture with Gatsby using popular content management systems like Drupal, Contentful, and Oracle Content and Experience (OCE), each of which offers compelling differences between their APIs, data structures, and content modeling. These approaches open the door to a slew of opportunities and much more flexibility for developers who have demanding requirements of their APIs, high expectations when it comes to managing and rendering content, and a growing predilection for increasingly granular CMS architectures that approach the ideal of the distributed CMS.

preston.so