In our previous article in the series, we discussed challenges associated with managing storage capacity for PACS inside a virtual environment. While utilization of storage capacity remains one of the most difficult tasks due to its cost and compliance implications for the hospital, we also need to think about how our users are actually going to access the capacity itself when requiring patient information and follow-ups.
It all boils down to IOPs – where, when and how is my PACs workload requesting, writing, and reading image data from my repositories. Couple this with dynamic requirements to share data across locations and departments and we have a big latency problem on our hands.
Traditional PACs implementations make this very difficult due to disparate archive repositories being used for storing image data where this data will typically be split up by location and image type – i.e. separate images for dental, brain, cardiovascular, ultrasound, department Y or department X. Thus, many organizations may find themselves with a significant amount of un-used storage and high latency related to requests across silo-ed infrastructures.
Breaking Down the Silo with VNA
The advent of VNA (vendor-neutral-archiving) was designed for this exact reason: to mitigate the risk associated with disparate repositories by allowing for the software standardization of image data and centralization into a single repository. But VNA application can present up to 3x the normal cost of a traditional PACs implementation for hospitals; and for EMR adopters and those sensitive to budgeting constraints, this doesn’t always present itself as a viable option. Even if VNA implementation allows for more flexible archiving, hospitals are still responsible with hosting this on a storage platform that requires heavy lifting for delivering IOPs demands continuously in real time.
Let’s continue our previous scenario with NFS storage running on a NetApp backend. Let’s assume my hospital has chosen to go the VNA route when I designed my aggregates in our previous discussion (SAS based shelves for high performance loads and SATA shelves with CIFS archives for long term data use).
When I had designed the original architecture, I asked myself the obvious question: what IO loads should I anticipate from my PACs workloads in real time? This led us to pursue considerable testing with our PACS vendor and NetApp to ensure appropriate RAID policies and architecture design for storage pools/arrays including flash cache, de-dupe and compression.
Then the highly anticipated day came where we put our PACs in flight and allowed access for our users, doctors, and patient’s. I strangely find myself asking the same, but slightly different, question that I had asked myself when I first designed my storage platform: what IO loads will I SEE from my PACs workloads in real time?
Now in reality, we are drastically oversimplifying this by ignoring database considerations, WAN connections, PCoIP settings for the desktop, and much more. For now we are solely focused on how the desktop retrieves, renders, and delivers image data to the user from an infrastructure resource perspective: namely storage IOPS and latency. Let’s begin with a typical use case around boot/login storms
My doctors/patients begin to access their PACs desktops around 9am in the morning for the start of hospital’s exams. This causes a huge spike in read-intensive IOPS during login and boot that ripples across my desktops and underlying NetApp storage. Due to the random nature of these IO requests and the distribution of my desktops across my underlying storage, I am finding it hard to pin point “who” the bully VMs are in my environment as latency alerts filter through my inbox.
If left unaddressed, we will continue to observe re-occurring latency on our PACs workloads when each desktop client tries to access its DICOM imaging information during exam updates. This could lead to noticeable performance constraints such as artificial shutters on radiology images, incorrect display of patient information, or loss of data itself. For doctor’s that have re-occurring appointments for patients and need to access critical patient information frequently this presents a significant loss to productivity in treating illness
In addition to accessing radiology data that already exists, we have to also consider net-new scans that are being performed for the first time. This will inherently require a heavier amount of write IOPs as new imaging data gets written to our underlying data repositories. High latency during these write operations could present equally negative outcomes for doctors/patients. Consider what would happen if someone was undergoing a brain scan for neurological diagnosis and the PACs workload was unable to render this data accurately. If this happens, the doctor may actually need to re-run an MRI on the patient that risks over-exposure to radiation.
Not every fix is a solution...
While adding more spindles and storage capacity to my environment seems like an easy solution, it will not prevent all forms of latency if I don’t truly understand where workload demands are accessing this capacity. To make the best use of the current environment, I had originally chosen to tier my NetApp storage across SATA, SAS, and Flash-cache enabled storage pools, but based on changing workload demands and PACS growth, I will need to constantly go back and keep track of this as my environment scales. As a function of preventing risk, I also need to make real time decisions on how to intelligently place PACS workloads across available storage LUNs/Pools where they have the best access to the performance that they need on demand.
While sVmotion capability allows me to do this with zero downtime, I am still the operator in charge of determine how, when, why and where this happens. Where should I move the workload to? Which one should I move – the high IO consumer or several low consumers? If I move the due to IOPs/Latency – am I positive that I am not causing space congestion somewhere else? Does the aggregate even have enough IOPS capacity on disk to actually support this successfully or am I simply moving the problem somewhere else? – In an environment with dozens to 100’s of PACs desktops running in real time, we cannot expect human begins to master this game of control. There are simply too many variables and trade-offs to consider in real time and the result is that we typically make decisions after a problem has occurred in our environment.
In order to truly meet the demands of our PACS workload, software must evolve to solve this for us. This requires an understanding of the ongoing resource demands for our PACS desktops and the ability for software to bridge this demand to the available supply in our environment continuously and in real time. At the same time, placement decisions must be made with an awareness of sizing, capacity and future growth.
In our next article, let’s explore how DR compliance for compounds the challenge of maintaining a PACs infrastructure. When we tie all three of these objectives together – it will be interesting to ponder: Can we truly expect humans to continuously control the environment in a state where performance is assured while utilizing the environment as efficiently as possible?
Image source: Tristan Cobb, Turbonomic