Prediction Quality: The 4C’s of Source Data Evaluation

Prediction quality is not the same thing as data quality.  Last week, we suggested that the concept of “more data is always better” must be tempered with a thoughtful assessment of how that source data provides additional information from your well.  This speaks to a position of not more data, but better data. It seems simple, but the better the source data is, then the better the resulting predictions or recommendations are going to be. Think of data as the rocket fuel for an artificial intelligence journey. Below, we define a set of dimensions that can be used for evaluation of the source data.

Coverage: The total amount of source data that we have available for our wells. Source data required for AI solutions is not just the sensor stream.

Continuity: How much source data is available without gaps or lapses. This is very important when reviewing sensor streams or the available set of dynacards for rod pumps.

Consistency: Frequency of updates or new values in a time series data stream. Consider that the consistency or frequency of updates must be higher than the indication of the failure the solution is attempting to predict.

Connectedness: Indicates the ability to trace a thread of connections for a well across all of the source data. This may appear to be simple, but it’s surprising how many different naming schemes can exist across teams and systems.

Over the next two weeks, we will take a deeper dive into each of the dimensions above.  We hope that you will continue to join us for a discussion on these dimensions and their importance in evaluating source data.  If you want to get a head start on learning more about these dimensions, you can request our white paper entitled, “Data Quality Fuels the AI Race.” 

Related Articles