Spaces:
Running
on
CPU Upgrade
Mistake in evaluation code
Hi, thank you for organizing this challenge! I believe there is a mistake in this code: https://huggingface.co/usm3d/tools/blob/main/hoho/wed.py
As an example, my code predicts 7 vertices, whereas the GT has 18 vertices. Then, when I experiment with different number of predicted edges, compute_WED computes the same score for edges = [(0,1), (2,3), (0,5)], edges = [(0,1), (2,3)], edges = [(0,1)], and even edges = []. This is of course counter-intuitive since there are only two cases:
- an additional edge is a correct one, then WED should decrease
- an additional edge is wrong, then WED should increase
The score can't stay the same.
Then, I started checking the code in more details, and it seems that I've found a bug. On L17, at runtime, row_ind = [0 1 2 3 4 5 6], col_ind = [ 7 8 17 14 4 5 6],
which means that row_ind correspond to indices of the predicted vertices, whereas col_ind corresponds to indices of the GT vertices. Then, on L35: there is this line:
updated_pd_edges = [(row_ind[np.where(col_ind == edge[0])[0][0]], row_ind[np.where(col_ind == edge[1])[0][0]]) for edge in pd_edges if edge[0] in col_ind and edge[1] in col_ind]
where edge in pd_edges is checked on being in col_ind (GT indices). Therefore, row_ind and col_ind are swapped.
The correct implementation should be:
updated_pd_edges = [(col_ind[np.where(row_ind == edge[0])[0][0]], col_ind[np.where(row_ind == edge[1])[0][0]]) for edge in pd_edges if edge[0] in row_ind and edge[1] in row_ind]
Moreover, later on, when computing deletion_edge_costs, edges_to_delete are defined in the space of "GT vertex indices", but pd_vertices are still indexed by the originally predicted indices. In short, edges_to_delete on this line should be redefined back to the indices (between 0 and 6):
vert_tf = [np.where(col_ind == v)[0][0] if v in col_ind else 0 for v in range(len(gt_vertices))]
deletion_edge_costs = ce * sum(np.linalg.norm(pd_vertices[vert_tf[edge[0]]] - pd_vertices[vert_tf[edge[1]]]) for edge in edges_to_delete)
These two changes are sufficient for correct implementation.
Another strange thing about this implementation is that edges are defined as tuples (oriented edges), but as far as I understand, in this task edge orientation doesn't matter. The current implementation would penalize providing (1,0) edge instead of (0,1). Therefore, I suggest this fix (switch to sets instead of tuples):
pd_edges_set = set(map(tuple, [set(edge) for edge in updated_pd_edges]))
gt_edges_set = set(map(tuple, [set(edge) for edge in gt_edges]))
One more issue: you never penalize edges that are connected to at least one vertex, which is not matched to any vertex in the GT. For instance, if the solution is the same as GT, but you add 1 more vertex and introduce edges from this vertex to all other vertices, then these additional edges are not penalized. The fix would be to add another term to deletion_edge_costs, which would also add lengths of edges_to_delete_no_match, edges with at least one vertex that is not matched.
Thanks for this! We are reviewing it now.
Thank you very much @rozumden for pointing this out. We have concluded that you are correct and have updated the metric. Please see our full response here: https://huggingface.co/spaces/usm3d/S23DR/discussions/4.