GraphQL to Neo4J

graphqlAbout a year ago I wrote a GraphQL to Neo4J mapping layer from our GraphQL implementation in NodeJS.  For those that don’t know much about Neo4J, it is a graph database that has it’s own query language called Cypher.  Like SQL, Cypher lets you query data from the database but it is specifically designed for querying nodes and relationships between nodes in a Neo4J database.

The GraphQL to Neo4J Mapping Layer

The GraphQL to Neo4J mapping layer I created converted GraphQL type search objects, passed into GraphQL queries as arguments, into Cypher queries executed against a Neo4J database.  Cypher results were retrieved and converted back into the shape required by the GraphQL api, including Neo4J properties and labels being mapped back to their GraphQL field names.

  1. A GraphQL query is called with a search argument describing what to search for.  For example: (and: [ { node: { type: { equals: “Person” } } }, { node: { name: { equals: “Joe” } } } ])
  2. Translate the GraphQL search object into Cypher query: ‘MATCH (Node:Person) WHERE node.fullName = “Joe” RETURN node’
  3. Translate the Person node returned from Cypher back into a JSON object that has the right property names (fullName -> name)
  4. Return the resulting list of people from the GraphQL query resolver

GraphQL to Neo4J Impedance Mismatch

All mapping layers have an impedance mismatch.  The GraphQL to Neo4J layer I created had these challenges:

Typed GraphQL to Loosely Defined Neo4J Properties

GraphQL is typed and each type has a specific set of properties.  You can create Union types that describe the return as one of many types but each item returned will be one of those types and have a specific set of properties.

Neo4J allows each node in the database to have a different set of properties.  Even nodes of the same type (think Person).  Some people can have a name.  Other people can have a fullName.  Others still can have a properName.  These sort of loosely defined properties presented a conceptual challenge.

In the end, I didn’t have to accommodate these types of loosely defined properties.  Our Neo4J data was loaded into the database programmatically so every Person did have the same types of properties.  If I had been required to solve these types of problems, I would have had to have a more complicated property map or instead of returning normal GraphQL types, I could have returned a JSON type which would allow for unspecified properties to be used.

Complex Cypher Queries Spanning Several Nodes

We didn’t require queries that spanned several nodes in the database but if we had it would have been similar to a SQL table join that returns data from several tables.  Each node returned would be related to each other in some way (described in the query or the results) and therefore could be mapped back to a proper return type and relationship type describing the connection between the two.

Common Mapping Layer Pattern

This mapping layer wasn’t the first mapping layer I’d created from Node or GraphQL and, it probably wasn’t isn’t even the tenth mapping layer I’ve created from some programming language to some database.

Database mapping layers have been around for a long time.  This mapping layer followed the patterns I’ve used previously and that we had already used for getting data from MariaDB using SQL and from our REST API building query strings for GET requests.  All mapping layers have these things in common:

  • Entity Map: Some way to know the mapping of entity names between your local code and the dababase (class to table, or graphql type to table, etc)
  • Property Map: Some way to know the mapping of entity field names to database table column names (or database node label/property names, etc)
  • Query Generator: Some way to convert a local description of a search into a query that executes that search in the database (search object to SQL or Cypher)
  • Helpers: Some helpers that use your entity map and property map and query generator to generate queries and return values with the correct entity types and property names
  • Management of the connection to the database (connection pools or creation of database connections)

Conclusions

Mapping From GraphQL to Neo4J mostly followed the database mapping patterns I’ve built before and / or seen before.  Like all database mapping layers there were some complexities to deal with but in the case of Neo4J the biggest hurdle I faced was learning how to use Neo4J and the Cypher query language to get data.  Once I understood the database and the query language, producing a framework for mapping from GraphQL to Neo4J wasn’t far off and turned out to be pretty fun journey.

 

 

Advertisements

Can GraphQL Call Itself?

graphqlIn GraphQL, an async resolver function resolves a query, mutation or a single field on a GraphQL type.  Each resolver function is passed 3 arguments:

  • object – the object you are resolving
  • args – the args passed to the field or query
  • context – a context for resolving queries, mutations or fields

The context argument can contain any information you put in it when starting to resolve a top level query or mutation in graphql.  Most of the time, this context contains DataLoaders, or a database connection or authentication information.

Typically the async resolver function will use the context to connect to a database or REST api, get or modify the data using the database or REST api and then return the necessary information.

Include graphql In The graphql Resolver Context 

Recently I did something that ended up being very useful for some of the complex work we were working on.  I added graphql to the context used when resolving a function in graphql.  Doing so made it possible to call graphql while resolving a graphql request and to make use of the same DataLoaders and other context information.

An Example – copyToDoList

To see the value in having graphql be able to call graphql, lets consider a simple ToDo list application and API.  Each User can have a ToDoList.  Each ToDoList can have ToDoItems.  When resolving a mutation to copyToDoList(id), it might be easiest to simply query for the existing ToDoList using the existing graphql query getToDoList(id), remove all the ids from the returned ToDoList and all the associated ToDoItems and then save the ToDoList and all the ToDoItems with no ids under a new name by calling the mutation to saveToDoList(ToDoList) which would save the list and list items and assign ids to all the entities and return the result.

We just implemented the copyToDoList mutation resolver by calling the getToDoList graphql query, then calling the saveToDoList graphql mutation and returning the result.

Conclusion

There are a number of benefits of enabling this type of functionality inside graphql resolvers.  First, it gives you the option to implement a resolver function by calling your own graphql queries or mutations, if that is the fastest or easiest way to do it.  Second, it allows your resolver function to deal with external names for entities and fields instead of the underlying names from database tables or column.  Third, it makes it possible to leverage existing graphql resolvers.  Finally, it compounds the performance savings by sharing the same DataLoaders for any queries or mutations you call while resolving which could result in fewer round trips to the database or REST api to get or modify the underlying data.

Creating A Public GraphQL API

graphql2017 saw the creation of several high profile public GraphQL APIs.  These GraphQL APIs are great for customer integrations, third-party integrators,  professional services and internal consumption.  When creating a public GraphQL API there are several important considerations.

Surface Area – Keep It Skinny

Deciding how much surface area to expose is important.  Exposing a too-limited set of apis and properties can result in an API that isn’t usable by some of your key consumers.  Exposing too many APIs and properties leaves you on the hook to support those APIs that you didn’t expect people to use, or in ways you didn’t expect them to be used.  The approach we took was to keep it skinny.  It isn’t just easier to add functionality when it becomes necessary.  Keeping the API skinny gives you a growth path and prepares API consumers for the continuous evolution which all public APIs inevitably go through.

Naming Conventions – Consistency Matters

Locking down simple naming conventions early is very important.  While the API is going to evolve, you don’t want to torture your customers with a continuous renaming of public APIs through new query/mutation names and the slow deprecation of old query/mutation names.  The naming conventions we settled on were:

  • createEntity, updateEntity, deleteEntity for mutation names
  • saveEntity for complex entity operations (create it if it doesn’t exist or update if it already exists)
  • entity to query an entity by id and entities to query entities using a variety of search criteria
  • Entity for type definition names
  • SearchEntity for the search criteria for a particular entity type

Deprecating Fields – Avoid Surprises

We use deprecated fields as a way to tag things that are going to go away.  The goal should be to keep deprecated fields for a couple major releases after tagging them as deprecated so you don’t surprise consumers when they disappear.  We also use deprecated fields as a way to tag experimental features.  These are warnings that the API is very likely to change and/or might only be partially implemented.  Again, the goal is to give consumers a hint so there are no surprises.

Architect For Reuse

Try to build architectural components that aid in resolving queries from similar data sources.  In our case, 60% of our current functionality uses existing REST APIs under the covers.  It was possible to build helpers to leverage the similarities that the REST APIs share so each query resolver, field resolver, sort order, etc. can leverage these reusable helpers instead of treating the implementation of each resolver as the first time.  We were able to build architectural components to help getting data from REST, direct SQL, Cypher queries to Neo4J and even another GraphQL endpoint.

Think About Testing

I recently blogged about our approach for creating a GraphQL integration testing framework.  Think about testing as you build out your API.  We created unit tests as developed the API, schema snapshot tests to catch accidental schema changes, and eventually full GraphQL integration tests that are completely independent of external datasources.  These tests will give you confidence that you haven’t broken things for your consumers as you iterate and improve your internal architecture.

Evangelize It

Teaching consumers about your awesome new GraphQL API is something you will want to do.  What is the point of developing such a wonderful and easy to use API if your customers, third-party integrators, professional services or internal developers don’t use it?  My advice for evangelizing any technology is the same – keep it simple.  GraphQL is new and often misunderstood.  Explain how your GraphQL API makes it super easy to get or manipulate data.  Explain how the hierarchical nature of GraphQL query results saves consumers from having to stitch results from several REST or DB calls together on the client side.  Keep it simple and approachable and pretty soon you’ll have consumers asking for enhancements, performance improvements and finding new and exciting ways to leverage your data!

 

GraphQL Integration Testing

graphql

Recently a colleague and I collaborated on a  unique approach to doing GraphQL integration testing.  Our unit testing setup already had the ability to test GraphQL resolver functions or helper functions used by the resolvers.  Our end to end testing can be done easily by bringing up a GraphQL server and running queries against it while it resolves those queries by accessing a real database or REST api.  What we designed for integration testing is somewhere in between.

Our approach gives you an end to end test of full GraphQL queries and mutations while automatically mocking external resources for you and comparing results against snapshotted data.

Goal: Test Full Queries With Easy Test Creation

The goal was to test a full query or mutation without using any external resources – no databases, no REST apis, etc.  We also wanted to make test creation easy – as easy as normal jest snapshot testing if possible.

Mock External Resources and Snapshot Results

We mocked external resources by running queries against a live system and capturing the interactions with external resources.  SQL queries and responses were recorded.  REST calls were recorded. After capturing the calls and responses we saved them in special directory similar to jest snapshots.  Each test case file would have a captured data file in the captured traffic directory.  When the test ran a 2nd time, the captured data would be used to prime the code where we made external calls by using sequealize and fetch wrapper objects.  When fetch is called, it uses the primed data to return the previously captured data. Likewise, when sequelize is called, the sequelize wrapper would use the previously primed data to return the previously captured data.

Start a GraphQL Server with Express Handlers

Next we started up our GraphQL server with some special express handlers that allowed us to prime or start/stop capture of interactions with external resources. We could have created GraphQL apis to prime/start/stop capture but we didn’t want to modify the public API that our customers interacted with.  Instead of modifying GraphQL to allow us to prime/start/stop capture we instead added conditional express handlers used only during integration testing runs.

First Run – Capture Data

When a test case is run without previously capture external data, the express server is told to start capture mode.  The query is run and the external interactions are captured.  After the query is run, the test case requests the just captured data and saves it in the captured data directory next to the test case.

Subsequent Runs – Use Captured Data To Prime Sequelize and Fetch Wrappers

The second time the test case is run, it primes the express server with the previously capture data, runs the query which uses the previously captured data for all external interactions and then compares the results with the snapshotted results.  This approach effectively tests all things inside GraphQL related to the query without using any external resources.

Putting It All Together

  • Start up a GraphQL server with special express handlers
  • Run all integration test queries/mutations against the GraphQL server just started
  • If a test is run without previously captured data:
    • Use the express handlers to start of capture of external interactions via REST or sequelize
    • Run the test
    • Get the captured data and save it in a special directory similar to jest snapshotted data
    • Use the express handlers to stop capture of external interactions
    • Run the next test
  • If a test is run with previously captured data:
    • Use the express handlers to prime the sequelize/REST wrappers with previously the captured data
    • Run the query/mutation which will use the previously captured data
    • Compare the results with the previously snapshotted results
    • Run the next test
  • Stop the GraphQL server

Improving GraphQL Performance

graphql

Providing Easy Access To Data

GraphQL is a data query language and a fantastic middleware layer that reduces round trips from the browser to get data and seamlessly hides the complexity of getting data from disparate data sources including REST apis, sql, nosql and graph databases as well as any other type of data storage.

Performance Can Be A Problem

Those are the upsides. The downside is that because it is so easy to get data from various sources it is possible to write GraphQL queries that generate hundreds or thousands of calls for underlying data.

DataLoaders Save The Day

GraphQL provides a mechanism to deal with all of these requests in a really elegant way. DataLoaders offer a way to load data for some key, which can be any string or id or object.  The DataLoader de-duplicates requests for identical information but they can do even better than that.  DataLoaders are invoked to load data for a single key (via the load function) or for multiple keys at a time (via the loadMany function).  Regardless of how the DataLoader is invoked, it queues up requests for data and only makes the request after all requests have been queued or after some arbitrary limit is hit.  This means that the requests for data can be consolidated into a single batch request for data.  Instead of asking for each piece of data by id, you can ask for all pieces of data for a set of ids.

These two properties of data loaders, de-duplication and batching, make them the perfect tool for improving GraphQL query performance.

Real World Example

Using DataLoaders, I recently was able to cut a heavy GraphQL query from taking 45 minutes and making hundreds of REST and thousands of sql database calls down to 4 seconds, making 20 REST and 20 sql database calls for data.

Conclusions

When you are looking to improve your GraphQL query performance, start by looking at the underlying calls for data.  Using DataLoaders you can de-duplicate and consolidate those calls for data and dramatically reduce query time.