Developed by at The Children's Hospital of Philadelphia Research Institute

Complex Data Models

Handling complex data models sounds more attractive, but even simple ones can benefit from a really good data access layer API.

Perceivably flat data access layer

It is simpler to choose/search from a list of items than to attempt to traverse a relational data model or document store. Many times you simply don't know what you're looking for or when you do, you don't (and shouldn't) know where to look.

Descriptive metadata (and domain specificity)

Merely having a data model and its constraints is not enough. The first barrier for users is figuring out what about the data they can search for. Humanize the data model by adding some descriptive love.

Free-text data model search

An intuitive and powerful search depends on the first two points mentioned above. The (highly descriptive) metadata as well as the structural metadata can be indexed and searched against directly. A match in this case would be particular data point whether it is for query or display purposes.

Expand the search by including the data itself

To make the search more robust, discreate data can be indexed and associated with each data point as well. As an example, if I type male, the available query or view options may result in gender. This enables users to find what they are looking for by directly searching for a known data value. One caveat to this is regarding permissions. Even if a certain end user is not allowed to view certain data, a match occuring from typing male would potentially reveal that there are male data values.

Humans aren't constrained, databases schemas are (and for good reason)

Databases have data types to allow for fast and effective search on data. For example, you cannot query the string hello world using a numerical operator (at least in a way that makes sense). For this reason, data are split up into multiple fields suiting the needs of the data. For example when you view a cooking recipe, you would expect to read an ingredient such as 2 teaspoons of salt. What would happen if you only knew the ingredient name i.e. salt? You wouldn't know how much of the ingredient you need. Likewise if you only saw teaspoons without 2, you would not know the quantity of salt to add.

The power of the database comes from storing and indexing discrete values which enables fast search and sorting capabilities. Humans however need to be able to view these discrete values in ways that means something to them.

Large Data Sets

Large data sets should only benefit users in the sense that they have more data to explore. Similar to the comment on complexity.. small data sets will work just fine here as well.

Usability must have an O(1) relationship to data size

The scale of the data must not tax its usability. Interfaces must be able to scale with the data transparently and not burden the user with too many options at once.

Get a sense of the data by showing aggregate statistics

Most data should be looked at an aggregate level. If you choose to view the gender data, the appropriate view is a series of aggregate counts for male, female and unknown. This immediately gives the user a sense of the data. For example, if the user is interested in the male population, but the dataset has only a few males, the user can decide to continue or not.

These statistics can be thought of as another set of metadata, but this time computed from the data itself.

Stats are good, visuals are better

This goes hand-in-hand with displaying aggregate statistics. Visuals such as histograms are used to display the distribution of data. This is particularly important for continuous data where simply listing min, max, mean, mode, standard deviation and variance is not good enough. Again, at this stage the goal is for a user to get a sense of the data before having to query or view it.

Domain Specificity

It's simple. Humans are more comfortable with the language and concepts they know and understand.

Group data to produce information

In the example above, a recipe ingredient was used as an example to convey the importance of context and being presented with all the necessary information. In most cases, data stored in a structured and constrained database system can be thought of raw data and does not lend itself well for presentation. This data needs to be grouped together (like the ingredient) and formatted appropriately for display. If the ingredient name, quantity and unit were displayed out of order, that would cause confusion.