Interpolation and sampling effects on recurrence quantification measures

The recurrence plot and the recurrence quantification analysis (RQA) are well-established methods for the analysis of data from complex systems. They provide important insights into the nature of the dynamics, periodicity, regime changes, and many more. These methods are used in different fields of research, such as finance, engineering, life, and earth science. To use them, the data have usually to be uniformly sampled, posing difficulties in investigations that provide non-uniformly sampled data, as typical in medical data (e


I. INTRODUCTION
The analysis of data from complex real-world systems creates the basis for many different fields in science, such as finance, engineering, life, and earth science.For many kinds of data, standard measures, such as mean, standard deviation, or higher moments, are not sufficient to capture the details of their dynamics.For this purpose, methods from complex system science are more appropriate, such as Lyapunov exponents, 1 complex networks, 2 symbolic dynamics, 3 or recurrence analysis. 4Recurrence analysis provides a set of tools, e.g., for studying synchronization, classifying different Chaos ARTICLE pubs.aip.org/aip/chatypes of dynamics, or detecting regime transitions. 5The increasing popularity of this framework is reflected by the lively methodical development and the growing number of applications in many scientific disciplines. 6n several fields, data are only available on a non-equidistant time axis; for instance, measurements based on heartbeats are timed by the rhythm of the heart, 7,8 which gives a natural varying timescale, and this leads to the necessity of interpolation when compared with other variables that form the cardiorespiratory system, such as the blood pressure. 9Time series of pulsars and rotating stars are obtained through non-equidistant observations, 10,11 and paleoclimate data are hampered by the non-constant sedimentation rate, resulting in non-equidistant sampling. 12Such unevenly sampled data are a challenge for most time series analysis techniques, which usually require equidistant sampling points.One solution is to transform the time series to an equidistantly sampled one using interpolation or other techniques (e.g., transformation cost 13 ).Other approaches try to modify the time series analysis tools to be directly applicable to uneven time series, such as Lomb-Scargle periodogram, 14 kernel-based correlation, 15,16 or edit distance-based recurrence analysis. 17However, interpolation is still a widely used technique, although it can cause serious bias in the results (overemphasizing the lower frequencies).Although this interpolation effect is known for several time series analysis techniques, 18 it is not yet systematically considered for recurrence analysis and, thus, can lead to wrong interpretations and conclusions.
6][27][28][29] Changes in the recurrence properties, mainly based on changes in the distribution of the diagonal line structures in recurrence plots (related to determinism or predictability of the underlying dynamics), can be used to identify regime changes.Paleoclimate data are usually retrieved from specific geological archives, e.g., marine and lake sediments, tree rings, ice cores, or stalagmites.The climate information is stored during the growth (deposition) of these archives.Since the growth or sedimentation rate can differ over time, but the sampling procedure of such archives usually uses an equidistant sampling scheme, and the final time series are usually unevenly sampled in time. 12sing interpolation without due consideration before conducting recurrence analysis can lead to bias in the distribution of diagonal line structures in the recurrence plots, finally resulting in erroneous conclusions.
In this work, we show the effects of different and changing sampling times and subsequent interpolation on the recurrence quantification analysis using different types of model systems as well as a real paleoclimate example.
We further suggest an approach to estimate quantitatively these effects and provide a correction scheme.Although we focus here on applications in paleoclimate research, the correction scheme can be analogously applied on other research questions, e.g., in astrophysics or physiology.
This work is structured as follows: In Sec.II, the recurrence plot and recurrence plot measures are introduced.In Sec.III, the used models and methods are given.Then, we present the measured effects for the simulated systems.Afterward, in Sec.IV, we present the derivation as well as the evaluation of our correction scheme.In Sec.V, a real-world example is considered.We conclude our findings in Sec.VI.

II. PRINCIPLES OF RECURRENCE PLOTS (RPs) AND RECURRENCE QUANTIFICATION ANALYSIS (RQA)
The recurrence plot (RP) is based on the fundamental principle that a dynamical system will always return to a state arbitrarily close to its initial or any other state. 4This fact is used to simplify the generally multidimensional phase space trajectory to a matrix containing only zeros and ones.This method can be applied to any series of states in a given phase space x i | i = 1, . . ., N , where x i is the phase space vector at the time step i (corresponding to time t = i • dt and dt the sampling time) and N the number of considered states (or the length of the time series).If we do not have access to the full phase space vector, it can be reconstructed using time delay embedding. 30,31To create a matrix representation of the recurrences of the data, the phase space distance from all pairs of data points is calculated using a suitable norm, represented by the distance matrix ( If not stated otherwise, we use the Euler norm.A recurrence is finally defined as having the pairwise distance D i,j between the states smaller than a specified recurrence threshold ε; i.e., the recurrence matrix is derived from the distance matrix D by applying the threshold ε, leading to the binary recurrence matrix R with its elements with being the Heaviside function and ε being the recurrence threshold. 4he recurrence matrix R can be displayed as a plot, where all the ones (R i,j = 1) are marked by points, and such an entry is, therefore, called a "recurrence point" or a "1-point."In contrast, the points R i,j = 0 are called a "non-recurrence point" or a "0-point."This plot is then called a recurrence plot (RP) and can be visually interpreted because it expresses rich patterns characteristic of specific dynamics. 4,32lthough the first visual impression gives some important hints, further (quantitative) analysis is often useful.For this purpose, the recurrence quantification analysis (RQA) was introduced. 7,33,34he recurrence rate RR, the fraction of recurrence points in the RP, gives a quantification of how often the system returns to the same region in the phase space.Most of the other measures in RQA rely mainly on the length distribution P l (l) of either diagonal or vertical lines (formed by the recurrence points) visible in the RP.Such a diagonal line is denoted here as a "recurrence line" or a "1-line."The idea behind the study of diagonal recurrence lines is that their length corresponds to the time the system evolves similarly compared to the other times the system visited the same region in the phase space.Therefore, the fraction of 1-points that form diagonal 1-lines in the RP is a heuristic measure of how deterministic the system is, Chaos ARTICLE pubs.aip.org/aip/cha with P l (l) being the probability to find a 1-line with exact length l, l min a chosen minimal line length, and N the maximal line length (equal to data length).This measure is called determinism and ranges between 0 and 1.It becomes 1 if all recurrence points belong to diagonal lines equal or longer than l min and 0 if no recurrence point does.5,36 The calculation of DET is based on the distribution of diagonal line lengths.For the sake of simplicity, here, we focus on the average diagonal length, In contrast to the standard definition of this measure, 4 we consider here all line lengths, including such of length 1.This simplification allows us to construct correction schema for possible sampling and interpolation effects.This measure is related to the prediction time.However, it depends on the temporal resolution of the system.For Gaussian white noise, L is 1/(1 − RR) and, therefore, for small RR close to one, whereas for perfectly periodic systems, it should theoretically be infinite but is limited by the RP size and boundary. 37L is very sensitive to noise because noise causes random interruptions in the diagonal lines.In paleoclimate studies, measures quantifying the diagonal lines are often interpreted in terms of the predictability of the climate. 29,35,36n many cases, the temporal variation of the RQA measures is of interest as they can reveal changes or transitions in the system's dynamics, such as when the system is approaching a tipping point.To determine such changes, the sliding window approach is used.The time series is partitioned into smaller segments (windows) of a predetermined length, which may overlap with one another.Then, the RP and the RQA measures are calculated for every window separately.The distance between the start points of two consecutive windows is called the window step size, and the size of a window is the window size.It is important to determine which time point is assigned to each window.In this study, we just take the starting point of the window so that all points used to calculate the measure are in the interval after this time point.

III. INTERPOLATION EFFECT ON RECURRENCE ANALYSIS OF MODEL SYSTEMS
To demonstrate the interpolation effect, we use two model systems to generate prototypical data.We use an autoregressive model, where the course of the data is stochastically driven, and a Roessler 38 system, which is a typical non-linear system, fully described by three non-linear ordinary differential equations.
To quantify the deviation in the L measure due to the difference in the sampling rate and subsequent interpolation, we compare L ref calculated from a reference series without interpolation with L int calculated from the interpolated series with the same temporal resolution as the reference series by taking the ratio L int /L ref .Ratios greater than 1 indicate an increase of the measure due to the interpolation and ratios smaller than 1 a decrease.This could then either lead to an over-or underestimation of an L ref value when analyzing the interpolated series.The effect depends on the considered system, the interpolation ratio, and the offset.The interpolation ratio r is the ratio of the sampling time of the time series before and after interpolation.The offset is the time difference between the first value in the time series before and after interpolation (Fig. 1).
An interpolation ratio greater than 1 increases the total number of points.For integer interpolation ratios, every rth point in the interpolated series lines up with the underlying series only if the offset is zero.

A. Systems
We use the two different model systems for our study and generate the data using the following equations.To remove transient behavior from the initial conditions to some kind of stable dynamics, the first part of every time series is discarded.

Autoregressive model
The time series is generated iteratively according to the rule where ξ is Gaussian white noise (mean zero and standard deviation one) and a and σ are free parameters, defining the auto-correlation (memory) and the impact of the noise.We used a parameter setting of a = 0.99 and σ = 1.Three series are generated for the analysis with each 2501 data points.The initial conditions are drawn from a normal distribution, and the first 100 points are discarded.The noise and the initial conditions are generated with a random number generator with three different seeds to make the results reproducible.

Roessler model
The Roessler system is fully described by the differential equations 38 We use the standard parameters a = 0.15, b = 0.2, c = 10 and a temporal resolution of dt = 0.01.To approximately solve these equations, we used the Euler method and the initial condition (x = 15, y = 0, z = 0).The obtained series has 12 505 data points, and the first 500 points are discarded.Afterward, only every fifth data point is used so that the series of both systems have 2401 points.

B. Method
To mimic the effect of different sampling times and subsequent interpolation, we consider high-resolution series, representing the "true" dynamics.To obtain series with different temporal resolution, multiple downsampled versions are constructed.One downsampled time series is chosen as reference series and analyzed without interpolation so that the results are unchanged and describe the dynamics of the system.All other downsampled series are called the test series (see Fig. 2).In the analysis, the offset for the reference series is chosen to be 0 because only the difference in the offset between the reference and the test series is important.This gives the following series: High-resolution series: Reference series: Test series: where N hr , N ref , and N test,n are the lengths of the different series, d ref is the downsampling factor of the reference series, and d test,n and k test,n are the downsampling factors and offsets of the different test series, where n numbers the different test series.The corresponding times are given by t i = i • dt.We generate interpolation functions for every test series using linear, quadratic spline, cubic spline, and pchip interpolation.To obtain the interpolated series, we evaluated these interpolation functions at the same time points as the reference series (see Fig. 1).
Using the reference instead of the high-resolution time series as a comparison, it is possible to study non-integer interpolation ratios, offsets, and also interpolation ratios smaller than one.The interpolation ratios are given as where dt hr , dt ref , and dt test,n are the different sampling times from the series defined above and dt int is the sampling time of the interpolated series.
The RP and the RQA measure L are calculated for the reference and for every interpolated series using the same recurrence threshold ε.L ref is the measure obtained for the reference series and L int for the interpolated series.The differences in the RQA measure are due to the different sampling times and the interpolation.We can compare the results from all interpolated series to the reference series by computing all ratios L int /L ref .
Using this procedure, we mimic the typical sampling bias, e.g., in paleoclimate studies, where sliding windows with different sampling are all interpolated to the same reference sampling and the hidden "true" dynamics is the continuous nature and is not accessible.

C. Results
To illustrate the deviations in the RP between an interpolated series and a reference series, we consider a short time series of an arbitrary autoregressive process and downsample it by a factor of four before interpolating it linearly back to the original size.This procedure leads to an interpolation ratio of r = 4 and no offset.Here, the high-resolution time series corresponds to the reference series.Comparing the RP of the interpolated time series with the RP of the reference time series, we find some differences (Fig. 3).Points that are 1-points in both RPs are black, points that are only 1-points in the reference RP are red, and points that are only 1-points in the interpolated RP blue.The coarse structure is preserved under the downsampling and interpolation; however, on a smaller scale, many short 1-lines are missing in the interpolated RP, while other 1-lines are merged into greater recurrence structures.Both effects lead to an increased average 1-line length.
To quantitatively study the effects on the average 1-line length, we use the method described in Sec.III B. For every model system, the high-resolution series is created according to Sec.III A. From this one, reference and multiple test series are created by downsampling with different offsets and downsampling factors d test,n .We use downsampling factors between 1 and 50 and for every downsampling factor five different offsets, if possible.For series with a downsampling factor smaller than five, the number of different offsets is limited by the downsampling factor.
The reference series are obtained with a downsampling factor d ref = 5 and no offset.The high-resolution series have a length of 2401.The reference series have, therefore, 481 data points.The recurrence thresholds for the systems are chosen so that the RP of the reference series has a recurrence rate of 10%.The recurrence thresholds as well as the L ref measure are given for the Roessler and the three autoregressive systems in Table I.The number in the name of the autoregressive systems (AR-42, AR-43, and AR-44) states the used seed for the random module of the Python library NumPy.With the three different autoregressive systems, we can check whether the results are changed for other realizations of the noise.For every system, the ratio L int /L ref is calculated for every interpolation ratio, interpolation method, and offset separately.To simplify the data, the mean and the standard deviation are calculated over the different offsets, and for the autoregressive systems, also over the three different realizations.The result is, therefore, only dependent on the kind of the system, the interpolation ratio, and the interpolation method.

Roessler system
For the Roessler system, L ref and L int values are almost equal for interpolation ratios smaller than 1.5 (Fig. 4 left).For a larger interpolation ratio, the L int values for the different interpolated series are smaller than L ref and decrease with an increasing interpolation ratio.The quadratic and the cubic spline interpolation lead to very similar results, whereas the linear and the pchip interpolation create the strongest deviation.The differences caused by the offset values are negligible for the linear interpolation, except at the integer ratios 2, 3, and 4.An explanation for this behavior is given in Sec.IV.Additionally, the same analysis was done with the maximum norm instead of the Euler norm.The results are qualitatively similar (see Appendix B).

Autoregressive process
For the interpolated series from the autoregressive systems, L int increases with growing interpolation ratios for all methods (Fig. 4, right).L int calculated from the linearly interpolated series shows the strongest deviation from the reference series.The effect of different realizations and offsets is similar for all considered interpolation methods.
To investigate the deviation caused by different offsets, L int /L ref is calculated without offset and compared with the mean over all offsets.We find that the deviation is only different for integer interpolation ratios (Fig. 5).This is as expected because only for integer interpolation ratios, an offset determines whether all time points of the test series are aligned (without offset) or misaligned (with offset) to the interpolation time points (see Fig. 1).The L int /L ref value is smaller without an offset, which means that the deviation of L int from L ref is smaller.
This result shows that interpolation can have a remarkable impact even if the interpolation ratio is 1 if there is some time offset between the points of the interpolated and test series.This situation commonly arises when the sampling time varies, causing misalignment between the interpolated series and the data.Even in portions of the data where the sampling time in the interpolated series is equal to the data's sampling time, the exact timings of the different series can be misaligned.

IV. CORRECTION SCHEME
Knowing the specific effect of interpolation on RQA measure L, it becomes apparent that a correction scheme aimed at mitigating this interpolation effect would be both desirable and feasible.

A. Method
To construct a correction scheme, which models the deviation in the L measure between an interpolated series to the reference series, it is necessary to account for the influences of interpolation and the differences in the sampling rate.
L can be calculated using the total number of diagonal 1-lines (i.e., lines consisting of values 1), where N r is the total number of 1-points in the RP, N l is the number of 1-lines (including the single points, i.e., lines with l = 1), N is the length of the series, and, therefore, N 2 is the total number of points in the RP, and RR is the recurrence rate.The total number of points N 2 in the RP generated from the reference series R ref and the interpolated one R int is equal.In all investigated examples, the recurrence rate RR is also very similar; therefore, the deviation in L can be derived from the difference in the number of 1-lines N l .As we will show, the number can differ, because To tackle the problem, we restrict ourselves to the following case: • There is an integer interpolation ratio between the test series and the interpolation series and no offset.
This means that the test series consists of every rth data point from the reference series and is, therefore, a downsampled version of it.The downsampling results in a RP R test consisting of every rth point in every rth row of the reference RP R ref (with r being the interpolation ratio).The same is true for the diagonals: The ith diagonal in R test consists of every rth point from the (r We call these points anchor points and these diagonals anchor diagonals (Fig. 6).To calculate the relative change in the number of 1-lines, we restrict ourselves to these diagonals.
At first, we calculate the difference in the number of 1-lines, due to the different sampling, by comparing the structures in R ref and R test .There are more 1-lines in R ref if there are more 1-lines starting between two anchor points than depicted by them.We refer to the sequences of 0-and 1-points between two anchor points as intervals.Only if the first anchor point is a 0-point and the second one is a 1-point, there is a starting point in R test ; all further starting points in R ref are additional.To quantify this effect, we can calculate the mean number of 1-line starting points.This is calculated separately for intervals following a 1-or a 0-anchor point, where N l 1 is the mean number of additional 1-line starting points per interval, for intervals following 1-points, and N l 0 for intervals following 0-points.P N l ,1 ( N l ) and P N l ,0 ( N l ) are the probabilities to find intervals with N l 1-lines starting points, starting on either 1-points or 0-points.
To find these probabilities, we calculate the length distribution for sequences of alternating 0-and 1-lines, which have N l 1-lines starting points and calculate the probability that they fit between two anchor points.Such a sequence between two 1-points has a length between l s is the start length, which is the number of 1-points after a random point on a random 1-line and d i and l i are the length of random 0-lines and 1-lines.Here, the second equation includes the 1-line containing the second 1-point and the first does not [example for N l = 2 in Fig. 7(a)].A sequence between a 1-point and a 0-point, which has N l 1-line starting points, has a length between [example for N l = 1 in Fig. 7(b)] and To get the probability that such sequence fits between a 1-anchor point and a random second anchor point, we add up probabilities for a 1-and a 0-point as a second anchor.In both cases, the probability is derived from the probability that the sequence is shorter than the interval length, multiplied by the probability of reaching the next anchor point when considering the next line together.The second factor is always a conditional probability because both lengths are not independent of each other.The distance between two anchor points and, therefore, the interval length is the same as the interpolation ratio r.The underlying probability distributions are given later.In total, we get where r is the interpolation ratio and P(X < r) are cumulative probabilities, which can be calculated from the probability distributions: P(X < r) = r−1 X=0 P X (X), where P X (X) are the probability distributions of the different sequence lengths.
The first two rows give the probability that if the first anchor point is a 1-point that there are N 0-lines and ( N − 1) 1-lines before the next anchor point, which is also a 1-point.The third and the fourth row calculate the probability that there are N 0-lines and N 1-lines before the next anchor point, which is a 0-point.In both cases, the total interval has N more 1-lines than the interval downsampled to the two anchor points.This is explicitly given for all possible configurations for one example interval in Appendix C. The first and third row are calculated from the probability distribution of the sum of the given random variables [Eqs.(11)  and ( 13)].For our calculation, we assume that the probabilities for the length of the consecutive 1-and 0-lines are independent.This gives where * represents a convolution and [P d * P l ] x(M) indicates that this part of the equation is repeated M times.P ls (l s ), P d (d), and P l (l) are the probability distributions for the start length, the 0-lines, and the 1-lines.P d (d) and P l (l) have to be known and are only accessible from the 0-and 1-line length histograms of the reference recurrence plot.How to estimate these when the reference series is not known is discussed later.The probability distribution for the start length l s is (see Appendix A) for the conditional probabilities in row two, we first need to modify the probability from before so that the condition is fulfilled, where the part in front of the curly bracket is the normalization factor.This can now be used to calculate and equivalent for row four.P N l ,0 ( N l ) can be calculated in the same way by interchanging all d i and l i and calculating l s with P d (d) instead of P l (l).
We now calculate the difference between R int and R test .Afterward, we can add up the changes.When interpolating the test series, the RP R int regains the size of the reference RP R ref , and every anchor point is equal in both plots (Fig. 6), If we make the assumption that the anchor diagonals have the same statistics for the length of 0-and 1-diagonal lines compared to all diagonals in the interpolated RP R int , we only have to investigate the intervals between anchor points.For linear interpolation, there are four possibilities in R int .For the formal derivation, see Appendix B.
1.In intervals lying between two 1-anchor points, all points are 1; therefore, there is no 1-line starting point.2. In intervals lying between two 0-anchor points, there is at most one 1-line in between; therefore, there is either one or zero 1-lines starting point.3.In intervals lying between one 1-and one 0-anchor point, then there is one 1-0 transition and therefore, no start of a 1-line.4. In intervals lying between one 0-and 1-anchor point, then there is one 0-1 transition and, therefore, one start of a 1-line.
Only the second point causes differences in the number of 1-lines between the plots because there is no start of a 1-line in R test , but there might be in R int .If there is a 1-line, we call this case a jump.The total number of jumps is N j and can be directly measured by counting the number of intervals with jumps in R int .From this, we can follow that there is always a bigger or equal amount of 1-lines on anchor diagonals in the interpolated RP R int compared to the test RP (R test ).
The estimated total difference N l of 1-lines lying on anchor diagonals between R int and R ref can now be calculated as the sum of, minus the number of intervals following a 1-anchor point times the mean number of 1-line starting-points in such an interval minus the number of intervals following a 0-anchor point times the mean number of 1-line starting points in such an interval plus the number of jumps, with N d being the total number of points on the anchor diagonals, r is the interpolation ratio, and RR is the recurrence rate.
N d r RR is the number of intervals following a 1-anchor point, and is the number of intervals following an 0-anchor point.The first two summands quantify the difference between R test and R ref , and N l 1 and N l 0 are calculated using Eqs.( 9) and (10).The last summand quantifies the difference between R int and R test .
To get the estimated deviation of the average diagonal line length from this consideration, we first use Eq. ( 8) to write where L int is the average line length measured in the interpolated RP R int and L est is our estimation of L ref measured in the reference RP R ref .N int l is the number of 1-lines in the interpolated RP R int on the anchor diagonals, and N l is the estimated difference of the 1-lines between the reference and the interpolated RP on these diagonals.
After inserting Eq. ( 8) with N r = N d • RR, which is the number of 1-points on the anchor diagonals, and Eq. ( 22), this can be written as The total number of jumps N j can also be rewritten as a probability n j that a random interval contains a jump by dividing the number of jumps N j by the number of intervals N d /r, Inserted in Eq. ( 24), this gives the final result This equation can be used to estimate the correction for the effects of the different sampling and the interpolation under the conditions that there is an integer interpolation ratio r and no offset between, the reference series and the test series.Furthermore, anchor and not anchor diagonals in the interpolated RP must have similar probability distributions for the 1-and 0-lines (P l (l) and P d (d)), and the probabilities for consecutive lines have to be independent.In order to calculate our estimate L est from a measured L int from the interpolated RP, it is necessary to know the statistics for the 1-and 0-lines [P l (l) and P d (d)] from the underlying dynamics.This might for a real-world example be accessible from a period in the data with a high sampling rate.The recurrence rate RR as well as the jumping probability n j can be directly computed from the recurrence plot of the interpolated series.The recurrence rate is just the fraction of 1-points, and n j is calculated by counting the intervals that contain a jump and dividing the results by the number of intervals.
In Sec.IV B, the use of this correction scheme is demonstrated on a simulated data processing example, where everything is known about the true underlying dynamics.Afterward, the scheme is used to correct for sampling and interpolation-induced biases in real-world data.

B. Results
To evaluate the correction scheme, L int /L ref is calculated for the first autoregressive system and the Roessler system in the same way as before (Sec.III) and compared to the correction L int /L est , which is calculated with the described scheme.r being the interpolation ratio is varied, and while n j and RR are obtained from the interpolated series and the corresponding recurrence plot R int , P l (l) and P d (d) are obtained from the reference RP R ref .
The estimated L int /L est for the autoregressive process follows the increase in L int /L ref with increasing interpolation ratio in the real data very closely if the interpolation ratio is smaller than 5 [Fig.8(b)].For higher values, there are some deviations caused by the small size of the test series.It has less than 100 data points for an interpolation ratio greater than 5.For the Roessler system, the correction only qualitatively captures the decrease, but with an underestimation of the effect [Fig.8(a)].FIG. 8. Estimation and measured relative deviations of the L measure between the interpolated and reference series for the Roessler (top left) and autoregressive system (top right) (the same as in Fig. 4) for integer interpolation ratios.Equation ( 26) is used to calculate the correction.The bottom row shows the same systems, but the L-measure for the calculation of the change as well as for the calculation of the correction is obtained only from the anchor diagonals (Fig. 6).
This deviation is caused by the fact that the following assumptions are not true for the Roessler system.On the one hand, the statistics of the diagonal lines are very different when looking at all diagonals compared to the anchor diagonals.If only these diagonals are considered, the correction for the Roessler system captures the effect also quite well [Fig.8(c)].On the other hand, the assumption that the length of consecutive 1-and 0-lines is independent of each other is not true for a deterministic system, such as the Roessler system.
Using consideration from Sec. IV A, it is possible to understand the reasons for the differences between the systems.For the autoregressive process, the dominating effect is that there are fewer short 1-and 0-lines in R int compared to R ref ; therefore, the total number of lines is smaller and L int is greater than L ref .The dominating effect for the Roessler system is that there are more short 1-lines between two anchor points in R int than in R ref ; therefore, the total number of lines is greater and L int is smaller than L ref .

V. APPLICATION TO PALEOCLIMATE DATA
In this section, we use the insights from the theoretical considerations and simulations above to investigate real-world data.We use the described correction scheme to reduce the influence of the interpolation and the different sampling times.
For this purpose, we use the data from a 290 m long composite core from Chew Bahir in southern Ethiopia.The core was created by combining the two ∼280 m long parallel lacustrine sediment cores CHB14-2A and 2B, which were drilled within the Chew Bahir Drilling Project (CBDP) in 2014. 39The aim of the CBDP was to establish a high-resolution environmental record spanning an important time interval of human evolution in eastern Africa.The potassium (K) concentration in the sediment is a proxy for the aridity in the region during the time of deposition. 40It was determined by micro x-ray fluorescence (µXRF) scanning, which was carried out with a spatial resolution of 5 mm.To attribute time points to the measured K concentrations, the age-depth model RRMarch2021 41 is used, which follows a direct-dating approach and uses multiple chronometers, including AMS 14 C dating, optically stimulated luminescence (OSL) dating, argon-argon ( 40 Ar/ 39 Ar) dating, and tephrochronological data.Details about the age-depth model can be found in Roberts et al. 41 In the following, we use this environmental record to show sampling and interpolation effects, which are not dependent on the accuracy of the age-depth model.A detailed RQA and corresponding interpretation has been performed in Trauth et al. 42

ARTICLE pubs.aip.org/aip/cha
The age-depth model reveals a non-constant sedimentation rate; i.e., the data points in this time series are not sampled equally in time.We investigate the temporal change of the mean diagonal line length L using the sliding window approach with a window size of 3079 yr.Therefore, we get a measure of the temporal evolution of the predictability of climate.The mean sampling time changes between these windows.In the oldest part from 616 787 to 434 038 yrBP, the sampling time is around 15 yr, and from 434 038 to 148 149 yrBP, there is a sampling time of around 10 yr.In the newest part of the data from 148,149 yr BP until present, the sampling time changes a lot, from a minimum of around 5.36 to 47.27 yr [Fig.9(a)].The strong changes in the sampling time result from the increased number of age measurements available for more recent times.The long periods of uniform sampling times arise due to the limited availability of datable material within the cores during those time intervals.
To analyze the data, we use the measured potassium concentration together with the corresponding time points from the age-depth model as our time series.We interpolate this series linearly and evaluate the interpolation function at time points with a fixed sampling time.To see the effect of a different choice of this sampling time, two time series are created, one with a sampling time of dt = 5.36 yr and one with dt = 47.2 yr.These sampling times correspond to the lowest and highest sampling times in the data.To avoid effects from the interpolation over long time periods with missing data (hiatuses), we exclude all points of the interpolated time series, where the temporal distance between the two neighboring measured data points exceeds 50 yr.For windows where more than 40% of the points are excluded, we do not calculate the L measure.This leads to some gaps in the L-series (see Fig. 9).
When comparing the L-curves calculated from the interpolated series with dt = 5.36 yr and dt = 47.2 yr, significant differences emerge.First, the first peak at approximately 10 000 yrBP is considerably stronger in the interpolated series with dt = 5.36 yr.Additionally, the amplitudes of the structures from 616 787 to 434 038 yrBP show an increase compared to the amplitudes of the structures between 434 038 to 148 149 yrBP [Figs.9(b) and 9(c)].The sampling time in the time intervals of increased amplitudes is lower than in the other parts, which leads to the hypothesis that the difference in the course of the data is due to the different interpolation.As we can see in Sec.III C, the deviations caused by an interpolation ratio smaller than 1 seem to be small compared to the effect of a larger interpolation ratio.This leads to the conjecture that the results from the interpolated series with dt = 47.2 are closer to the truth and the results of the series with dt = 5.36 yr are altered by the interpolation.To investigate this further, we apply our correction scheme to the L int values.To use this method, the original distributions P l (l) and P d (d) have to be known.We assume some stationarity; i.e., these distributions should not change too much within the course of the data.We, therefore, use the histograms of 0-and 1-line lengths from the interval where the original dt is 5.36 yr.This is the case between 36 132 and 45 501 yrBP.RR, L int , and n j are calculated for every window.The correction method can only be used for integer ratios; therefore, the interpolation ratio is calculated for every window by dividing the mean sampling time in the data inside the window by the interpolation sampling time of 5.36 yr and rounded to the next integer to calculate the correction.
After applying the correction, the first peak is reduced in its height, and also, the changes of amplitudes in the older part of the data match the results of the interpolated series with dt = 47.2 yr [Fig.9(d)].This is an indication that the correction provides a close approximation of the true effect, which leads to the deviations in the L measure and helps to avoid misinterpretation due to the interpolation.

VI. CONCLUSION
Recurrence plots and recurrence quantification analysis are useful tools for the analysis of data from non-linear systems.They can give valuable insights into the changes in the dynamics and indicate critical regime changes.Applying these methods to paleoclimate data give additional and complementary valuable insights, which are usually not accessible with standard linear methods.
When using RPs for paleoclimate data, we have to take into account that the data from paleoclimate archives are usually not uniformly sampled in time.One commonly used method to tackle this is to generate an interpolation function and evaluate it at equally spaced time points.This, however, can create some bias and lead to wrong conclusions.
In this work, we have demonstrated that, depending on the dynamics, the average diagonal recurrence line length L calculated from the RP of an interpolated signal can be greater or smaller compared to the true L derived from the RP of the raw signal with the same temporal resolution.We have shown this by comparing data with different sampling rates, which are interpolated to the time points of the reference time series.For a Roessler system, L calculated from the interpolated series is smaller than the one calculated from the reference series for all considered interpolation methods.For autoregressive processes, on the other hand, L is bigger.Furthermore, we have explained that for an integer interpolation ratio, a possible offset can also change the result, and this is even true if the temporal resolution is not changed.
We identified three main reasons for the difference between the series with different sampling and interpolation: (1) distinct recurrence lines in the reference RP are merged in the interpolated recurrence plot, (2) short recurrence lines in the reference RP are missing in the interpolated one, and (3) there are short recurrence lines in the interpolated RP, which are not present in the reference one.
Using these insights, we developed a correction scheme and discussed its capabilities and limitations.For the autoregressive process, it can predict the difference very precisely, as long as the interpolation ratio is not too large and the downsampled data, therefore, not too short.For the Roessler system, on the other hand, it can capture the decrease but underestimates the effect up to a factor of two.Here, the main reason for this deviation seems to be that the correction is only true for the diagonal lines in the interpolated plot, which correspond to a diagonal in the downsampled one.
In the last part of our work, we have investigated paleoclimate data from a lake sediment core.First, we showed that the course of the L measure, when calculated in sliding windows, strongly depends on the choice of sampling time when evaluating the interpolation function.We then applied our proposed correction scheme to produce an approximation of the actual course.
This study shows that there are potentially big biases, when interpolating data, before applying the RP method.It also shows that the effect is strongly system dependent, and a simple correction might not be possible.The proposed correction scheme provides an intuition on which system shows which effects and also provides an approximate correction.The correction scheme is only valid for integer interpolation values without offset, which is rarely the case for real data.Nevertheless, this can give an approximation of the real effect.Further research is required to enhance our understanding of the impact of offset and noninteger interpolation ratios.Another limitation of this study is that the series with different temporal resolutions are obtained by downsampling, which is only similar to the measurement process if the measure yields the value of some quantity at one time point.In reality, measurements often do some kind of averaging over a finite period of time.To understand to what extent this changes the result of this study and how to change the correction scheme, further research is required.

FIG. 1 .
FIG. 1. Schema illustrating the different series as well as the offset and interpolation.

FIG. 2 .
FIG. 2.Scheme illustrating the method used to investigate the change in the RQA measures due to the difference in the sampling time, offset, and interpolation.

FIG. 3 .
FIG. 3.Difference of interpolated and reference recurrence plots of an arbitrary AR process.Black pixel mark 1-points in both RPs; red pixel mark 1-points in reference RP, which 0-points in interpolated RP; and blue pixel mark 0-points in reference RP, which are 1-points in interpolated RP.The interpolation ratio is r = 4 and offset 0.

FIG. 4 .
FIG. 4.Relative deviation of L between interpolated and reference series for the Roessler system (left) and three autoregressive systems (AR) (right).The data points represent the mean over the different systems (only for AR) and offsets, and the error bars give the standard deviation.

FIG. 5 .
FIG. 5. Relative deviation of L between interpolated and reference series for three autoregressive systems (same as Fig. 4, right).The data points represent the mean over the different systems with (blue) and without (red) different offsets, and the error bars give the standard deviation.

1. separated 1 -
lines in R ref can be connected in R int , 2. 1-lines in R ref can be missing in R int , and 3. parts of 0-lines in R ref can form 1-lines in R int .(1) and (2) lead to fewer 1-lines in R int and a greater L compared to R ref .(3) leads to more 1-lines and a smaller L.

FIG. 6 .
FIG. 6.Recurrence plots of some arbitrary reference signal (left), the test signal (middle), and the interpolated signal (right).The red boxes indicate the anchor points, which are identical in all plots.Blue boxes indicate all points belonging to the anchor diagonals.The interpolation ratio r is 3.

FIG. 7 .
FIG. 7. Example intervals (a) between two 1-anchor points, showing one possible configuration with two additional 1-line starting points.(b) Between one 1-anchor points and one 0-anchor point, showing one possible configuration with one additional 1-line starting point.

FIG. 9 .
FIG. 9. (a) Mean sampling times for every window (red) and constant sampling times with which the interpolation function is evaluated (blue, green).The window step size is 3079 yr, and the window size is 6153 yr.(b) and (c) L for sliding windows, with dt = 47.2 yr (blue) and 5.36 yr (green).With fixed ε = 250.(d) Corrected data with a described scheme.

TABLE I .
Properties of the four different reference series.