Resumen:
Biological and biomedical knowledge graphs can be disparate in their design when a standard has not been widely accepted by the community, even more when they represent data from a previously underexplored domain. This may result in the formation of disparate knowledge graphs. Entity alignment methodologies examine the similarities and differences between entities across diverse knowledge graphs to ascertain which ones represent the same real-world entity. These alignments facilitate the integration of data and the interoperability between sources, which is a crucial aspect for comprehending the relationships between disparate datasets and uncovering novel insights. However, there is a lack of knowledge regarding the efficacy of these methods across different domains. We evaluate the performance of 20 entity alignment methods on biological knowledge graphs pertaining to gene regulation. Specifically, our investigation concentrates on enhancer sequences, the most extensively researched type of cis-regulatory domains, and their potential targets. The obtained hit values illustrate the potential of this and other methods to align and improve the interoperability between biological and biomedical knowledge graphs. Moreover, GCN Align demonstrates the highest robustness and produces the most accurate results in terms of matching ability and time required. Our study also examines the impact of the entity types included in the data schema. This analysis helps identify which domain areas are more effectively represented, which is crucial for optimizing the methods' efficacy and to identify areas of improvement in the schemas used. Additionally, the workflow is extensible to other data domains and can be adapted to new methods.