Pracownie badawcze

Dr hab. Paweł Siedlecki

Pracownia Cheminformatyki i Modelowania Molekularnego

ORCID: 0000-0002-7482-1341

E-mail:

Zakres badań

The Laboratory of Cheminformatics & Molecular Modeling (CHEMM Lab) investigates molecular interactions among biomolecules, integrating machine learning and deep learning with cheminformatics. We aim to introduce novel approaches in the dynamically evolving field of in silico drug discovery to predict and design the structure, function, and interactions of molecules.

Badania

Najważniejsze osiągnięcia badawcze

  • We developed a series of state-of-the-art machine learning and deep learning solutions for protein-ligand affinity prediction and in silico molecular screening.
  • We designed and implemented various models and protocols utilizing short MD simulations to improve predictive performance and in silico screening outcomes.
  • We used a combination of experimental and in silico techniques to demonstrate, for the first time, an anti-proliferative effect from a cell-free supernatant (CFS), highlighting the potential of environmental LAB strains in developing new treatments.

Opis badań

Cheminformatics, molecular modeling, and computational chemistry are integral to understanding biological systems, particularly the multifaceted interactions between proteins, nucleic acids, and ligands. These methodologies are essential not only in structure- and ligand-based drug discovery but also in medicine, agriculture, and environmental science.

The CHEMM group is dedicated to refining the sensitivity and specificity of affinity predictions, particularly in in silico high-throughput screening campaigns. We develop novel representations of biomolecular complexes and apply advancements in machine learning (ML) and deep learning (DL) to enhance the precision of binding affinity predictions. We explore novel compounds using a variety of theoretical and experimental methods, with a strong focus on their potential applicability.

One of the fundamental challenges in in silico screening is the reliance on suboptimal molecular conformations. In collaboration with Pedro Ballester (CNRS, Marseille), our team developed an effective model, RFScore-VS, trained to predict affinity values from suboptimal, noisy data. What sets our model apart is its utilization of negative data, which constitutes about 97.5% of our training dataset. Our approach has significantly improved the ability to distinguish between active and inactive compounds at the top of ranking lists. This method has been widely cited and was featured in the list of 100 most-read articles in Scientific Reports [1].

In our pursuit of an information-rich yet interpretable representation, we explored interaction fingerprints (IFPs), a key concept in cheminformatics that allows molecules to be represented as fixed-length Boolean or integer vectors. Our PLEC FP (Protein Ligand Extended Connectivity Fingerprint) builds upon the ECFP fingerprint by utilizing atom environments rather than predefined functional groups or substructures. This method has proven highly efficient in predicting binding affinity, even with simple linear models, and addresses the need for a general, descriptive, and easily interpretable solution in the field [2].

Neural networks offer distinct advantages, such as the ability to autonomously identify features crucial for predicting interactions. We developed Pafnucy, a custom-designed convolutional neural network trained on a unique grid representation of ligand-receptor complexes. A thorough evaluation demonstrated enhanced accuracy without human knowledge intervention, making Pafnucy one of the first neural network models applicable to virtual screening. This work has received over 400 citations and has influenced the current methodologies for building scoring functions [3].

Our recent studies have integrated molecular dynamics (MD) simulations with ML to assess both predictive performance and limitations. MD simulations provide a dynamic perspective by capturing the temporal interactions within protein-ligand complexes, offering additional insights into affinity and specificity estimates. By generating and analyzing over 2,500 unique protein-ligand MD simulations, we identified specific and generalizable features that improve predictive accuracy, suggesting new methods to enhance current in silico affinity prediction pipelines [4].

The CHEMM group also focuses on practical applications of its in silico methodologies. One pressing issue in agriculture is the impact of novel compounds on pollinators. With the global decline of bee populations posing significant risks to agriculture, biodiversity, and environmental stability, we introduced ApisTox, a comprehensive dataset that explores the toxicity of pesticides to honey bees (Apis mellifera). ApisTox serves as a crucial tool for environmental and agricultural research, aiding in the development of policies and practices that minimize harm to bee populations. It is also a valuable resource for benchmarking molecular property prediction methods for agrochemical compounds [5].

In our ongoing search for therapeutically promising molecules, we have focused on microbiome-derived compounds with potential applications in colorectal cancer (CRC). Together with our industry partner, we analyzed lactic acid bacteria (LAB) strains with a combination of experimental, bioinformatics, and cheminformatics techniques which revealed a specific strain capable of releasing arginine deiminase (ADI) into the culture supernatant under gut-like conditions. This release significantly reduced epithelial cell growth, leading to decreased c-Myc levels, reduced phosphorylation of p70-S6 kinase, and cell cycle arrest. These results demonstrate, for the first time, an anti-proliferative effect from a cell-free supernatant (CFS), independent of bacteriocins or other small molecules, highlighting its broad therapeutic potential.[6].

Bibliography

  1. Wójcikowski M, Siedlecki P, Ballester PJ. Building Machine-Learning Scoring Functions for Structure-Based Prediction of Intermolecular Binding Affinity. Methods Mol Biol. 2019;2053:1-12.
  2. Wójcikowski M, Kukiełka M, Stepniewska-Dziubinska MM, Siedlecki P. Development of a Protein-Ligand Extended Connectivity (PLEC) Fingerprint and Its Application for Binding Affinity Predictions. Bioinformatics. 2019 Apr 15;35(8):1334-1341.
  3. Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P. Improving Detection of Protein-Ligand Binding Sites with 3D Segmentation. Sci Rep. 2020 Mar 19;10(1):5035.
  4. Poziemski J, Yurkevych A, Siedlecki P. Assessment of Molecular Dynamics Time Series Descriptors in Protein-Ligand Affinity Prediction. ChemRxiv. 2024.
  5. Adamczyk J, Poziemski J, Siedlecki P. ApisTox: A New Benchmark Dataset for the Classification of Small Molecule Toxicity on Honey Bees. Sci Data. 2025 Jan 2;12(1):5.
  6. Jastrząb R, Tomecki R, Jurkiewicz A, Graczyk D, Szczepankowska AK, Mytych J, Wolman D, Siedlecki P. The Strain-Dependent Cytostatic Activity of Lactococcus lactis on CRC Cell Lines Is Mediated Through the Release of Arginine Deiminase. Microb Cell Fact. 2024 Mar 14;23(1):82.

Metodologia

We primarily use computational tools and methods. We design learning architectures from scratch, mainly with PyTorch and sometimes TensorFlow. We employ semi-flexible molecular docking techniques and enhance their results with molecular dynamics simulations, MM/PBSA and MM/GBSA, as well as classical and ML-based scoring functions. We model and analyze 3D structures of molecular entities, from small molecules and small binders to proteins and their interaction interfaces. We are particularly proficient in designing and conducting in silico screening campaigns.

We also analyze omics data, particularly microbiome-related proteomics and 16S sequencing, to develop novel approaches for patient stratification and personalized interventions. We collaborate closely with experimental groups from science and industry, particularly in the areas of the gut-brain axis, rare diseases, and cancer, developing annotation pipelines and predictive workflows in Python and R.

Wybrane publikacje

  • Wójcikowski M, Kukiełka M, Stepniewska-Dziubinska MM, Siedlecki P. Development of a Protein-Ligand Extended Connectivity (PLEC) Fingerprint and Its Application for Binding Affinity Predictions. Bioinformatics. 2019 Apr 15;35(8):1334-1341.
  • Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P. Improving Detection of Protein-Ligand Binding Sites with 3D Segmentation. Sci Rep. 2020 Mar 19;10(1):5035.
  • Poziemski J, Yurkevych A, Siedlecki P. Assessment of Molecular Dynamics Time Series Descriptors in Protein-Ligand Affinity Prediction. ChemRxiv. 2024.
  • Adamczyk J, Poziemski J, Siedlecki P. ApisTox: A New Benchmark Dataset for the Classification of Small Molecule Toxicity on Honey Bees. Sci Data. 2025 Jan 2;12(1):5.
  • Jastrząb R, Tomecki R, Jurkiewicz A, Graczyk D, Szczepankowska AK, Mytych J, Wolman D, Siedlecki P. The Strain-Dependent Cytostatic Activity of Lactococcus lactis on CRC Cell Lines Is Mediated Through the Release of Arginine Deiminase. Microb Cell Fact. 2024 Mar 14;23(1):82.

Współpraca

  • Pedro Ballester, Imperial College London
  • Waldemar Priebe, MD Anderson, Houston
  • Carlo Vascotto, University of Udine

Publikacje (z afiliacją IBB PAN)

WÓJCIKOWSKI M., ZIELENKIEWICZ P., SIEDLECKI P., Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. Journal of Cheminformatics (2015) 7: 26(6 p.) DOI 10.1186/s13321-015-0078-2 IF 4.547
UZIĘBŁO-ŻYCZKOWSKA B., GIELERAK G., SIEDLECKI P., PAJĄK B., Genetic diversity of SCN5A gene and its possible association with the concealed form of Brugada syndrome development in polish group of patients. BioMed Research International (2014) Article ID 462609, 13 p. http://dx.doi.org/10.1155/2014/462609 IF 2.706

Zespół