--- title: matching_series tags: - evaluate - metric description: "Matching-based time-series generation metric" sdk: gradio sdk_version: 3.50 app_file: app.py pinned: false --- # Metric Card for matching_series ## Metric Description Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (distance) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation. ## How to Use At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models. ```python >>> num_generation = 100 >>> num_reference = 10 >>> seq_len = 100 >>> num_features = 10 >>> references = np.random.rand(num_reference, seq_len, num_features) >>> predictions = np.random.rand(num_generation, seq_len, num_features) >>> metric = evaluate.load("bowdbeg/matching_series") >>> results = metric.compute(references=references, predictions=predictions, batch_size=1000) >>> print(results) {'precision_distance': 0.15843592698313289, 'f1_distance': 0.155065974239652, 'recall_distance': 0.1518363944110798, 'index_distance': 0.17040952035850207, 'precision_distance_features': [0.13823438020409948, 0.13795530908046955, 0.13737011148651265, 0.14067189082974238, 0.1364122789352347, 0.1436081670647643, 0.14458237409706912, 0.13806270434163667, 0.1409687410230486, 0.14361925950728213], 'f1_distance_features': [0.1296088638995658, 0.1321776706161825, 0.13029775314091577, 0.13175439826605778, 0.12737279060587542, 0.1356699896603108, 0.13397234988746393, 0.12775081706715302, 0.1315612879575721, 0.13479662354178928], 'recall_distance_features': [0.12199655178880468, 0.12686452003437784, 0.12391796468320122, 0.12390010513296679, 0.11945686853897312, 0.12856343456552471, 0.12481307474748718, 0.11887226171295895, 0.12333088520535256, 0.1269952147807759], 'index_distance_features': [0.1675969516703118, 0.1670366499114896, 0.1671737398882021, 0.17176917018356727, 0.1648541323369367, 0.1719173137987784, 0.1718364937170575, 0.16298119493341198, 0.17348958360035996, 0.18543997354490532], 'macro_precision_distance': 0.14014852165698596, 'macro_recall_distance': 0.1238710881190423, 'macro_f1_distance': 0.13149625446428864, 'macro_index_distance': 0.17040952035850207, 'matching_precision': 0.1, 'matching_recall': 1.0, 'matching_f1': 0.18181818181818182, 'matching_precision_features': [0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 1.0, 0.8, 1.0, 1.0], 'matching_recall_features': [0.1, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], 'matching_f1_features': [0.18, 0.16363636363636364, 0.17777777777777778, 0.18, 0.18, 0.18, 0.18181818181818182, 0.17777777777777778, 0.18181818181818182, 0.18181818181818182], 'macro_matching_precision': 0.91, 'macro_matching_recall': 0.099, 'macro_matching_f1': 0.17846464646464646, 'cuc': 0.12364285714285712, 'coverages': [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7666666666666666, 0.9333333333333332, 1.0], 'macro_cuc': 0.12047857142857143, 'macro_coverages': [0.10000000000000002, 0.19000000000000003, 0.32666666666666666, 0.51, 0.72, 0.8966666666666667, 0.99], 'cuc_features': [0.1175, 0.11607142857142858, 0.12214285714285712, 0.12507142857142856, 0.1202142857142857, 0.11735714285714285, 0.12042857142857144, 0.12028571428571429, 0.12864285714285717, 0.11707142857142858], 'coverages_features': [[0.10000000000000002, 0.20000000000000004, 0.3, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3666666666666667, 0.5666666666666667, 0.6666666666666666, 0.9, 0.9], [0.10000000000000002, 0.16666666666666666, 0.3333333333333333, 0.5, 0.6666666666666666, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.5666666666666667, 0.7999999999999999, 0.9333333333333332, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.43333333333333335, 0.6999999999999998, 0.9, 1.0], [0.10000000000000002, 0.20000000000000004, 0.26666666666666666, 0.43333333333333335, 0.6666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.4000000000000001, 0.6, 0.7333333333333334, 0.8666666666666667, 1.0], [0.10000000000000002, 0.16666666666666666, 0.3, 0.5666666666666667, 0.7666666666666666, 0.8666666666666667, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.5333333333333333, 0.8000000000000002, 1.0, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.4666666666666666, 0.7333333333333334, 0.8333333333333334, 1.0]]} ``` ### Inputs - **predictions**: (list of list of list of float or numpy.ndarray): The generated time-series. The shape of the array should be `(num_generation, seq_len, num_features)`. - **references**: (list of list of list of float or numpy.ndarray): The original time-series. The shape of the array should be `(num_reference, seq_len, num_features)`. - **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None. - **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3. - **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$. - **metric**: (str, optional): The metric to measure distance between examples. Default is "mse". Available options are "mse", "mae", "rmse". ### Output Values Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$. - **precision_distance**: (float): Average of the distance between the generated instance and the reference instance with the lowest distance. Intuitively, this is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{distance}(p_i, r_j)$. - **recall_distance**: (float): Average of the distance between the reference instance and the with the lowest distance. Intuitively, this is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{distance}(p_i, r_j)$. - **f1_distance**: (float): Harmonic mean of the precision_distance and recall_distance. This is similar to F1-score in classification. - **index_distance**: (float): Average of the distance between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{distance}(p_i, r_i)$. - **precision_distance_features**: (list of float): precision_distance computed individually for each feature. - **recall_distance_features**: (list of float): recall_distance computed individually for each feature. - **f1_distance_features**: (list of float): f1_distance computed individually for each feature. - **index_distance_features**: (list of float): index_distance computed individually for each feature. - **macro_precision_distance**: (float): Average of the precision_distance_features. - **macro_recall_distance**: (float): Average of the recall_distance_features. - **macro_f1_distance**: (float): Average of the f1_distance_features. - **macro_index_distance**: (float): Average of the index_distance_features. - **matching_precision**: (float): Precision of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{distance}(p_i, r_j)\} | }{m}$. - **matching_recall**: (float): Recall of the matching instances. In the equation, $\frac{ | \{j | \min_{j} \mathrm{distance}(p_i, r_j)\} | }{n}$. - **matching_f1**: (float): F1-score of the matching instances. - **matching_precision_features**: (list of float): matching_precision computed individually for each feature. - **matching_recall_features**: (list of float): matching_recall computed individually for each feature. - **matching_f1_features**: (list of float): matching_f1 computed individually for each feature. - **macro_matching_precision**: (float): Average of the matching_precision_features. - **macro_matching_recall**: (float): Average of the matching_recall_features. - **macro_matching_f1**: (float): Average of the matching_f1_features. - **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{ | \{ j | \min_{j} \mathrm{distance}(p_i, r_j) \text{where}~p_i \in \mathrm{sample}(P, \mathrm{n\_sample}) \} | }{m} \text{for}~\mathrm{n\_sample} \in \mathrm{cuc\_n\_samples} ]$. - **cuc**: (float): Coverage of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{distance}(p_i, r_j) < \mathrm{threshold}\} | }{n}$. - **coverages_features**: (list of list of float): coverages computed individually for each feature. - **cuc_features**: (list of float): cuc computed individually for each feature. - **macro_coverages**: (list of float): Average of the coverages_features. - **macro_cuc**: (float): Average of the cuc_features. #### Values from Popular Papers ### Examples ## Limitations and Bias This metric is based on the assumption that the generated time-series should match the original time-series. This may not be the case in some scenarios. The metric may not be suitable for evaluating time-series generation models that are not required to match the original time-series. ## Citation ## Further References