Coverage and continuity of source data are important dimensions when evaluating source data. Coverage describes the number of data sources that are available for a well. Many of the models that are used in advanced systems, such as OspreyData, are combining sensor based machine learning models with physics based models. With a holistic view of well design, attention to maintenance history and failure reports is critical to successful model development. Continuity of source data describes how much source data is available without gaps or lapses. This is very important when reviewing sensor streams, for example, or the set of dynacards available for rod pumps.
When considering coverage and continuity, review the figure above. It shows a set of sensors for a rod pump well. We can see that each of these sensors is incomplete for the time range being viewed. There are multiple lapses in the signals (highlighted in yellow). It appears that there are more lapses in signal than there are actual signal values.
Lapses create gaps in the history of the well. It is possible that the key change in signal is lost or missed during a gap in the signal. In multiple projects, OspreyData has experienced a 30% reduction in the number of failure examples used for model training. This reduction has been specifically due to a lack of signal preceding the failure. These wells had poor data continuity – being unable to see the signal stream preceding the failure meant that the machine learning models had to exclude those failures from training.
When considering data quality, coverage and continuity of source data are important dimensions used in the evaluation of the source data. To consider the impacts of these source data issues, we suggest that you request our whitepaper, entitled “Data Quality Fuels the AI Race.” Feel free to comment or ask questions about our white paper below in our comments section. We would love to hear your thoughts and begin a conversation.