For those in the field of process development or manufacturing science, in addition to being responsible for the efficient development of high-performance biological processes and ensuring the smooth progress of process scale-up, it is also necessary to ensure that commercial production is under control. Unified and reliable data management and analysis plays a vital role in the successful development and production of drugs.
If you work in bioprocess development, process scale-up validation, or manufacturing excellence, the main challenges you will face are:
- Where the data is and how to get it
- How to consolidate data and make it available for analysis
- How to analyze the data
You may also wonder: How can you get a complete and reliable overview of process data with less effort? How do you build large amounts of data from different sensors in a single database? How do you aggregate data from small and large scales for model validation at scale? What methods can be used for pharmaceutical data analysis and statistical evaluation of biological processes?
Senieer provides solutions for all of these challenges, including bioprocess data management, bioprocess data visualization, and statistical techniques for evaluating bioprocesses.
In this article, we will take a detailed look at data management.
Data Management
As a process engineer, you must work with large amounts of sensor data (such as pH, temperature, and dissolved oxygen), product quality data (such as product concentration, specific activity, relative potency), non-numerical data such as images (such as scanned SDS-PAGE data), and so on. For each analysis purpose, you need to manually reorganize data from different data sources in a time-consuming process.
Your first and most critical challenge is getting your data from a single source. Typically, data is distributed across multiple locations in the company, such as different departments, different devices, and data storage systems. You may often encounter multiple types of data that need to be processed in different ways.
We divide the data into two broad categories:
Time Series Data: Data recorded over time, that is, each value has a corresponding timestamp
It can be further classified as:
Data Recorded Online By The System
- Usually recorded at high temporal resolution
- It is usually possible to export in a defined export format
Manually Recorded Data
- Usually the temporal resolution is low
- Usually acquired in a different format
Feature Data (F): Single point data (that is, only one value).
It can be further classified as:
Scalar Features
A physical quantity that is completely described by its magnitude
Categorical Characteristics
Unit actions are assigned to a specific group or noun category
The challenge with time series data is that even in the same run, they are often recorded by different systems and may appear at different temporal resolutions. In addition, when comparing each other multiple times, it needs to be aligned with a specific event to make it comparable (e.g., inoculation time of the fermentation process or start time of elution in the chromatography).
By combining all types of data, you get the highest amount of information. This requires you to organize your data well. If you do it manually, it will be time-consuming and you will need to do it again when adding new data. As a result, traditional data analytics projects tend to include 80% of data mining and alignment, and only 20% of actual analysis.
One possible solution to this is to use an existing secondary system. During fermentation, you can use SCADA software for data alignment, which can often be connected to third-party systems such as exhaust gas analyzers and weighing balances. Some tools can even add manually recorded data. When exporting data, you need to make sure that the time resolution is adjusted to the same level so that you can get the relevant information while keeping the data size within a manageable range. It is recommended to develop a SOP for the export of such systems, which ensures that the consistency of variable naming and the data format remain unchanged.
Best Practice Databases Have The Following Requirements:
1) The appropriate database model for storing all bioprocess-related data in a common database
2) Possibility of adding meta information to time series data (e.g. definitions of processes and events)
3) Management of data preprocessing workflows
With a unique data model, Senieer aligns and contextualizes all data from MES, ELN, LIMS, DCS, Historian, Data lakes, and other standalone devices and makes them available on a single platform. Database filtering settings can be customized, which will help you quickly identify relevant batches, cell operations, and data types, and create your own datasets.