Semantic VistA

Mastering VistA's Data

The VA's VistA is the most widely deployed EHR in America and embodies over thirty years of design, trial and practice in Health-care IT. Semantic VistA is Caregraf's initiative to:

thoroughly understand VistA data in order to move meaningful datasets into and out of any live VistA, so enhancing patient care, easing system maintenance and repurposing VistA's know-how for the next generation of healthcare applications.

Traditionally VistA is seen as a suite of packages, Pharmacy, Lab, Vital measurement ... In Semantic VistA, we take a different tack, where VistA is a health data manager, one built around a data scheme that grew up over time. We need a full definition of this scheme to move a variety of datasets out of and into live VistAs.

semantic vista pieces (c) caregraf

Open then Analyze

Data management begins with access. VistA is built around a MUMPS-based data repository called FileMan - VistA data is FileMan data. The VA didn't create an SQL for FileMan - FileMan doesn't come with a remote query language or formal schema definition mechanism. FMQL, the FileMan Query Language, changes this. The open-source FMQL plugin let's you query the Schema and data of any Live VistA over a web-based interface. With FMQL, all VistA data - patient, institution, concept or system - and its schema are accessible through one straightforward mechanism.

Even with complete access if you don't understand the shape of data, you can't query it effectively. While FileMan's Schema is comprehensive, it grew up over decades and was never designed as a whole. There's inconsistent ways of expressing the same idea, arrangements that suit one type of use but not others and sometimes the schema is wrong - VistA packages often bypass schema checks so some real data doesn't conform. The build-in Schema also lacks key nuance. For example, it doesn't partition data types by function - is a data-type part of a patient record or for concept definition? This Package-centric Schema doesn't say.

To fully exploit FMQL-access, both the FileMan Schema and Sample Data showing how it is actually used, need to be analyzed, a process called Analytics. Add background from off-line VA Artifacts - specifications, release notes - and the result is a set of reports that define the Extended Schema required for comprehensive VistA Data Management.

Data out, to plan

With the access of FMQL and the definition of the Extended Schema, extracting Patient Records, VistA Know-how and Setups is straightforward.

Patient Records
In VistA, patient-specific information is a mix of workflow - orders, appointments, progress notes - with observations, procedures and diagnoses, in all over five hundred types of data. Drive FMQL with an Analytic's-enhanced Schema definition and all or selected patient data can be crawled out of a system - think of Google intelligently crawling a web-site.
Know-how
VistA embodies decades of health-care knowledge, know-how that should be reused in budding cloud-based health applications. Why invent when you can "emulate"? FMQL provides access to this heart of VistA - its concepts, their definitions and relationships and how they are used. VistA's drug model, Lexicon, document templating, order model ... are open to all.

Setups
Like other VistA data, FMQL exposes a system's setup - all locations and wards, doctors and drug selections. This eases VistA system management and can help configure other systems which need the same information.

These data collections are captured in RDF -formatted FileMan Data Graphs (FDG). RDF is the w3c standard for encoding knowledge, a flexible, standard format well-suited for FileMan's mix of hierarchal and graphical data. FMQL and FDGs are two sides of the VistA data coin: FMQL provides universal access, FDGs, a universal format. FDGs can be loaded and queried in one of the many commercial and open-source RDF triple stores, transformed into document formats like CCDs or into the formats of third-party systems that need VistA data.

Data in, from a more detailed plan

We want to insert VistA datasets as easily as we extract them. Rather than turn up a VistA through its roll-and-scroll interface, filling in screen option after screen option, we want to insert a Setup FDG in one shot. Rather than update a VistA's concept definitions in different ways, we want to load Know-how FDGs. And patient data from complementary EHRs, the workflow and results of outside Radiology, Pharmacy or Lab systems, we want to insert this as a graph. But data insertion is more involved than data extraction.

While graph insertion is well-understood, FileMan features and quirks introduce complications. Some data is created by side effect, some values must be closely related or identical and there are places where the schema differs from real data, declaring the optional, mandatory, or giving a field the wrong type. A schema definition enhanced for extraction needs more details to drive insertion.

On top of enhanced schema-definition, patient-data insertion demands a full picture of a system's know-how, even for inter-VistA transfers. Where a drug is known to a source system but unsupported in a target system, any prescription or order with that drug can't be transferred and even where a drug is supported in both, it will have a different identifier in each. A transfer into VistA from a non-VistA source introduces another complication - does the source have all the information VistA requires?

How far along is data insertion? The patient and system information in Caregraf's demo VistA and our internal test systems started out as FDGs. However, more analysis, more schema refinement is needed to make insertion as easy as access.

A Bonus - "One VistA" with OSEHRA

While Semantic VistA is focused on VistA data-management, FMQL access to all of VistA's data enables other work. OSEHRA, the VA's official open-source VistA custodian, has a Code Convergence effort to establish one "best of breed" VistA code base.

There is no one VistA today. Even FOIA VistA, the publicly available version of the VA's internal master, doesn't represent an official version. All running VistAs, both inside and outside the VA are specific collections of packages, many of which are old variations of what's in FOIA, some are home-grown, not in FOIA and some are variations of official VA releases. In One VistA, the system would move from a flat list of packages to a versioned core with strictly delineated add-ons. How? Analytics of the differences between VistA versions is a good start.

To learn more