diff --git a/tutorials/README.md b/tutorials/README.md index bf28271..807b745 100644 --- a/tutorials/README.md +++ b/tutorials/README.md @@ -2,3 +2,11 @@ In this directory, you will find material about how to use `dingo` but also on how you could contribute to this open source project. +## Running tutorials locally + +The tutorial notebook (`dingo_tutorial.ipynb`) was originally designed for [Google Colab](https://colab.research.google.com/github/GeomScale/dingo/blob/develop/tutorials/dingo_tutorial.ipynb). If you are running it locally (e.g., in Jupyter Lab or VS Code), make sure to select the Python kernel that has `dingo` and its dependencies installed. + +- **Jupyter Lab / Jupyter Notebook:** Go to *Kernel > Change kernel* and select the environment where you installed `dingo` (e.g., the Poetry virtual environment). +- **VS Code:** Click the kernel selector in the top-right corner of the notebook editor and choose the correct Python interpreter. +- **Note:** Some cells use Colab-specific commands (e.g., `!pip install`, `%%capture`). You can skip these when running locally, as long as `dingo` is already installed in your environment. + diff --git a/tutorials/dingo_tutorial.ipynb b/tutorials/dingo_tutorial.ipynb index c95aa8f..fe188ea 100644 --- a/tutorials/dingo_tutorial.ipynb +++ b/tutorials/dingo_tutorial.ipynb @@ -1,1690 +1,1701 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "XsGorJWn4Q6U" - }, - "source": [ - "# `dingo` : A python library for metabolic networks sampling and analysis" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "04-ezSozK3QT" - }, - "source": [ - "\n", - "## About this Colab notebook " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-IBI5DoLMDCn" - }, - "source": [ - "The aim of this notebook is to highlight the importance of metabolic networks analysis and the insight it may provide, as well as to demonstrate how to use the `dingo` Python library for the analysis of metabolic networks. \n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "L-pK94S-GPYG" - }, - "source": [ - "For your convenience, we suggest making a copy of the notebook on your Google Drive \n", - "\n", - "(`File > Save a copy in Drive`). \n", - "\n", - "This way you can change things on the notebook and make your own experiments as you like!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zUpwu6ue1MEG" - }, - "source": [ - "* The `&> /dev/null` in the end of a command is so you do not see all the messages returned when the command is executed.\n", - "* The `%%capture` flag is so Google Colab does not print the output of the chunk of code in a cell. We have used it in some cases to avoid long warning messages. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8CKAQaoN1ORJ" - }, - "source": [ - "In case an error comes up when running a notebook's command, you may click \n", - "\n", - "`Runtime > Factory reset runtime` \n", - "\n", - "and run the notebook from scratch." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "7CwAe0jxD5aB" - }, - "source": [ - "## Introduction" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nzzDw-EJKZpy" - }, - "source": [ - "### Genome scale metabolic network reconstruction" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "UdP-wtlwHdpZ" - }, - "source": [ - "🧬 🧬 Over the last decades, vast amounts of data have been gathered mostly thanks to the High Throughput technologies developed. \n", - "\n", - "As a result, the complete sequencing of a bacterial (strain) **genome** is now considered trivial. \n", - "\n", - "Global efforts have lead to large databases supporting the **annotation** of the genes found in a genome." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "H2UE0ORSS_sQ" - }, - "source": [ - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6ZG3nbgBN34k" - }, - "source": [ - "From the genome annotation we can now get the **enzymes** and therefore, the **reactions** that potentially take place in our strain (along with their stoichiometry)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ZwJEvcTfY_fC" - }, - "source": [ - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hKPc6n9Xd5PK" - }, - "source": [ - "And from the sum of the reactions present, along with their corresponding constraints, we can now get the metabolic network of the species. \n", - "Thus, we have now software tools that enable automatic genome-scale reconstructions (e.g. [carveme](https://carveme.readthedocs.io/en/latest/))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "QWnL4ZQxOcTb" - }, - "source": [ - "The corresponding metabolic model is the *mathematical representation* of what happens in the organism along with the interactions between it and its environment. Here is a toy example:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "l7YlhJjSPFRo" - }, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8V56ZZCY7aic" - }, - "source": [ - "And here is what metabolic networks allow us to see! " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 435 - }, - "id": "G5m2-7t7fUkf", - "outputId": "a053a2c0-967c-4580-94f7-2d9ed20061e3" - }, - "outputs": [], - "source": [ - "#@title Pathway reactions fluxes representation{ display-mode: \"form\" }\n", - "from IPython.display import HTML\n", - "HTML('')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nKY2gxKcf_zg" - }, - "source": [ - "This GIF is from a [presentation](https://gibbons-lab.github.io/isb_course_2020/micom/) from Christian Diener at Gibbons Lab!\n", - " 🤓 " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "DrikcKuf7hjt" - }, - "source": [ - "### Metabolic networks analysis" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "N5elFWq-7q6p" - }, - "source": [ - "Now that we do have a metabolic model, the question is *how could someone investigate its properties?*" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "v8cgveHyHdmq" - }, - "source": [ - "So, from each metabolic model we can get a stoichiometric matrix ($S$) of the reactions that take place in the organism along with their corresponding constraints. \n", - "As you can guess, $S$ is nothing but a matrix with $m$ metabolites and $n$ reactions!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e4rCPCwghY_6" - }, - "source": [ - "$S_{m,n} =\n", - " \\begin{pmatrix}\n", - " a_{1,1} & a_{1,2} & \\cdots & a_{1,n} \\\\\n", - " a_{2,1} & a_{2,2} & \\cdots & a_{2,n} \\\\\n", - " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", - " a_{m,1} & a_{m,2} & \\cdots & a_{m,n}\n", - " \\end{pmatrix}$" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BSXWJ5U8hdUK" - }, - "source": [ - "Where each line is a **metabolite** and each column a **reaction**. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "K5-JGM6jeend" - }, - "source": [ - "Considering our system is at steady state, the flux through each reaction is given by the following equation: \n", - "\n", - "$Sv = 0$ \n", - "\n", - "which defines a system of linear equations. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "y76vXL78oEKP" - }, - "source": [ - "The solution space for that system is a **polytope**; a geometric object with \"flat\" sides that lives in a large number of dimensions. \n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "n6NZrAVttexG" - }, - "source": [ - "**Flux Balance Analysis (FBA)** is commonly used to identify an optimal flux distribution by optimizing a linear objective function; e.g the biomass value of the organism. You may see more about FBA [here](https://www.nature.com/articles/nbt.1614/).\n", - "\n", - "\n", - "\n", - "\n", - "Figure from: [Libourel, I. G., & Shachar-Hill, Y.. Annu. Rev. Plant Biol. 59 (2008)](10.1146/annurev.arplant.58.032806.103822)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Ywwrs7BOtkYY" - }, - "source": [ - "**Flux Variability Analysis (FVA)** evaluates the minimum and maximum range of each reaction flux that can still satisfy the constraints of the metabolic model. \n", - "\n", - "To this end FVA uses a double linear programming problem, a maximization and a subsequent minimization for each reaction.\n", - "\n", - "\n", - "\n", - "Figure by a presentation of [Lizbeth Hampton](https://slideplayer.com/slide/13509212/)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "3CvfTpQw44FT" - }, - "source": [ - "Both FBA and FVA are biased methods as they require an objective function and they return either a single or a pair of flux values respectively. \n", - "**Flux sampling** on the other hand is an unbiased method for metabolic network analysis, providing a thorough description of the potential values a flux might get, along with the probability to get it. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "meBkuCBUogRJ" - }, - "source": [ - "Our goal is to perform **uniform sampling** on this solution space (also called **flux space**) to gain an unbiased overview of the model's global features but also to investigate how the model responses in various situations. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "CRPoqy2jpvIa" - }, - "source": [ - "However, sampling can be a rather challenging task both from the computational and the geometical point of view. \n", - "\n", - "💪 To this end, `dingo` makes use of the [Multiphase Monte Carlo Sampling algorithm (MMCS)](https://drops.dagstuhl.de/opus/volltexte/2021/13798/pdf/lipics-vol189-socg2021-complete.pdf) developed by the [GeomScale Org.](https://geomscale.github.io/) too. \n", - "\n", - "💪 Our method splits sampling in phases and makes use of **rounding** steps; this way it is both efficient and returns high quality samples. \n", - "\n", - "\n", - "\n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "2XOkRFcXsm6u" - }, - "source": [ - "In addition, the MMCS algorithm uses an optimized variant of Billiard Walk with faster arithmetic complexity per step. \n", - "\n", - "Here is an example of how a new point is found. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WX5_VagweffU" - }, - "source": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0pGWGH9Yefhx" - }, - "source": [ - "Finally, the modular design of the method enables the use of any random walk.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "akAjYEubCrTY" - }, - "source": [ - "-------------------------------------------------" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "PqWGwdr9Cs-H" - }, - "source": [ - "## Get and install *dingo*" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "164yhNEXMcSi" - }, - "source": [ - "\n", - "### Requirements" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BpZ3fcqKMcSj" - }, - "source": [ - "Mandatory:\n", - "* a Unix-like environment, e.g Linux, MacOS etc.\n", - "* a computer machine with at least 4GB of RAM; `dingo` requires about 4GB to compile \n", - "\n", - "Optional:\n", - "* for better performance, you may use `dingo` with the [`gurobi`](https://www.gurobi.com) solver. To do so, a license is required (free for academic purposes)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tTvoGY3DMcSj" - }, - "source": [ - "You can get the `gurobipy` Python interface of the `gurobi` solver by running:\n", - "\n", - "`\n", - "pip3 install -i https://pypi.gurobi.com gurobipy\n", - "`\n", - "\n", - "and download an academic license [from here](https://www.gurobi.com/downloads/end-user-license-agreement-academic/). \n", - "\n", - "For more information on how to get `gurobi` you may also have a look [here](https://support.gurobi.com/hc/en-us/articles/360044290292-How-do-I-install-Gurobi-for-Python-)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "UkdSbnJQwJSm" - }, - "source": [ - "> **Remember!**
\n", - "When working on a normal console (terminal) you will have to remove any `!` and `%` characters from the beginning of the following commands; these allow Google Colab to work as a terminal console. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "IQL-qaW4McSk" - }, - "source": [ - "### Installation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dqa089jFDXaH" - }, - "source": [ - "Install the dependencies for PySPQR library; for Debian/Ubuntu Linux, run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "0eBtFIaIzZ66" - }, - "outputs": [], - "source": [ - "!apt-get install libsuitesparse-dev &> /dev/null\n", - "%pip install sparseqr &> /dev/null\n", - "%pip install Cython &> /dev/null\n", - "%pip install cobra &> /dev/null\n", - "%pip install kaleido &> /dev/null\n", - "%pip install pyoptinterface[highs] &> /dev/null" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "yYqOnx7_4ZRN" - }, - "source": [ - "Get the `dingo` repo." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "A6Mcqrqt3_Sa", - "outputId": "e340135f-48c2-4e34-87e6-fa0c54e3024e" - }, - "outputs": [], - "source": [ - "!git clone https://github.com/GeomScale/dingo.git" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "qePKvNyVFrHG" - }, - "source": [ - "To load the submodules that `dingo` uses, run:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "n-zpDmAy9V-8", - "outputId": "97e759a1-99ed-4a5b-8953-6113d3b5c315" - }, - "outputs": [], - "source": [ - "%cd dingo/\n", - "!git submodule update --init" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "jb6XxVcbF0h3" - }, - "source": [ - "Then, download and unzip the `boost` library:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vkjI-YD69dkw" - }, - "outputs": [], - "source": [ - "!wget -O boost_1_76_0.tar.bz2 https://archives.boost.io/release/1.76.0/source/boost_1_76_0.tar.bz2 &> /dev/null\n", - "!tar xjf boost_1_76_0.tar.bz2 &> /dev/null\n", - "!rm boost_1_76_0.tar.bz2 &> /dev/null" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "RK_xgnsYAfOr" - }, - "source": [ - "Install `dingo`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Gu9E7fidgjnW" - }, - "source": [ - "The following command will install `dingo` and it will take a while to compile. \n", - "\n", - "Remember that about 4GB of RAM are required to do so. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "hMNc6tbiwhG1", - "outputId": "59572b1f-f2f2-4579-ce74-e43382864040" - }, - "outputs": [], - "source": [ - "!python setup.py install --user &> /dev/null\n", - "print('dingo is ready to go!')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ZJ3AW6QTMcSn" - }, - "source": [ - "### Run tests" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "PLCBCsYiGEXz" - }, - "source": [ - "We have made some tests for you so you can make sure `dingo` has been installed properly.\n", - "\n", - "You may have a look at these [test files](https://github.com/GeomScale/dingo/tree/develop/tests) so you can run `dingo` from your terminal." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "f7q8A5TDxtOS" - }, - "outputs": [], - "source": [ - "%%capture\n", - "!python -m unittest discover tests \"*.py\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rWJ00dGySfol" - }, - "source": [ - "If you look at the output of that command, you should see that every test passed.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "iwZEvB6tC6jc" - }, - "source": [ - "--------------------------------------------------------------------------" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rLfOQgaQuULC" - }, - "source": [ - "The following command is to make Colab have access to the `dingo` library that was just compiled. \n", - "There will probably be **no need for you to do this**. \n", - "In case Python does not find `dingo`, just check the \n", - "semifinal line of the compilation step, in this case:\n", - "\n", - "`Writing /root/.local/lib/python3.7/site-packages/dingo-0.1.0.egg-info`\n", - "\n", - "This is the path where `dingo` was compiled. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ojsTmKVZYocZ" - }, - "outputs": [], - "source": [ - "import sys\n", - "_ = (sys.path.append(\"/root/.local/lib/python3.8/site-packages/\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Ua5lGNc8C8QE" - }, - "source": [ - "-------------------------------------------------------------------------------" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "2dtcobmYQOWp" - }, - "source": [ - "## Run `dingo`.." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "OKZhVhU5McSp" - }, - "source": [ - "### ..to manipulate your model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ljInPZvPMcSp" - }, - "source": [ - "Get the path to the `dingo` directory. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "S7ieYXqfMcSq", - "outputId": "d9ec3348-6c5a-46b5-b815-15238625b216" - }, - "outputs": [], - "source": [ - "import os\n", - "import numpy as np" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "AiFwL4lRugVC" - }, - "source": [ - "And import the `MetabolicNetwork` class of `dingo`! " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "FahzUpQFommK" - }, - "outputs": [], - "source": [ - "from dingo import MetabolicNetwork" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "AzFd-qTWryBW" - }, - "source": [ - "Now, you can load a metabolic model. \n", - "Currently, `dingo` may get either a `.json` or a '.mat` model format." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NNSJM_Z3McSs" - }, - "outputs": [], - "source": [ - "model = MetabolicNetwork.from_json('ext_data/e_coli_core.json')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8BU8YYL9PC-z" - }, - "source": [ - "Let's see what the model comprises. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "jxLug5xuPMcB" - }, - "source": [ - "First of all, let's see the metabolites ($m$) included in our model. \n", - "Here is the first 10 of them. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "_1r63O_AXzTO", - "outputId": "d54ee08c-7aae-4d4b-9aa7-8090bcdd2fb5" - }, - "outputs": [], - "source": [ - "print('Number of metabolites in my model: ', len(model.metabolites))\n", - "print('Here are the first 10 metabolites in your model: ', model.metabolites[:9])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sxaWOTJ-Paoq" - }, - "source": [ - "And now, let us have a look in the reactions ($n$) that occur. \n", - "Again, here is the first 10 reactions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "oEYN_hbeXjpy", - "outputId": "a8dab6dd-9d69-4e35-d98c-3278b55b7f65" - }, - "outputs": [], - "source": [ - "print('Number of reactions in my model: ', len(model.reactions))\n", - "print('Here are the first 10 reactions in your model: ', model.reactions[:9])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "qZk9H2aBPrFj" - }, - "source": [ - "Moreover, let us see the stoichiometrix matrix ($S$) of our model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "DurJhomTPzO1", - "outputId": "0d4f107a-7bdb-48ff-e8f8-ee6909476ac2" - }, - "outputs": [], - "source": [ - "model.S" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "wHkrYqfTemnW" - }, - "source": [ - "And its dimensions are of course $m*n$." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Tl52cmuaenlY", - "outputId": "256123a8-4ef3-4c3d-a7e3-3298e3066535" - }, - "outputs": [], - "source": [ - "model.S.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nsGLu7c0d3Ra" - }, - "source": [ - "Last but not least, let us have a look at the **biomass** or **objective function** of our model. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "hpnbfsBVeCFK", - "outputId": "c7bf618f-eb30-4226-986e-752e15caa544" - }, - "outputs": [], - "source": [ - "model.objective_function" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MLXZt3umeGJm" - }, - "source": [ - "As you can see, this array has only a single `1` in index `24`.\n", - "Let's see which function is this. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 - }, - "id": "-04jUh1veQNt", - "outputId": "054be7c9-8378-4609-98e8-28e0cac299ef" - }, - "outputs": [], - "source": [ - "model.reactions[24]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dLi2HZWTjS0e" - }, - "source": [ - "Ok, so what exactly is the biomass function? 🤔 \n", - "\n", - "Let's print the `24` column of our $S$ matrix to see what we have there." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "1c93OKZ7eUPX", - "outputId": "19169c24-22ae-4e5b-b3c9-054ca0270dd5" - }, - "outputs": [], - "source": [ - "# We import numpy and set the threshold parameter to max to enable the display of all the elements of the array\n", - "np.set_printoptions(threshold=sys.maxsize)\n", - "model.S[:,24]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "iJa3YQr7jq2-" - }, - "source": [ - "And just to make sure, let's print the length of this array too. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "wFTLZewLiAH-", - "outputId": "eb1b0b32-e025-426d-f59d-670cb416d8bc" - }, - "outputs": [], - "source": [ - "model.S[:,24].shape" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6TlgVCJijyCO" - }, - "source": [ - "As expected, the length of the biomass array equals to the number of the metabolites of our model. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "EjD7uBgfhYu0", - "outputId": "ada5cb8b-0aa5-4292-a177-274b8f150b79" - }, - "outputs": [], - "source": [ - "len(model.metabolites)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Pq0IOSSdkECR" - }, - "source": [ - "Moreover, you need to remember that the indexes of those 2 arrays are in order, meaning that the actual biomass function would be as follows:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "oQ6GhgTblIND" - }, - "source": [ - "$ 59.81*adp\\_c - 59.81*atp\\_c -0.0709*f6p\\_c + ... -0.8977*r5p\\_c$" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "NmVjo47VP8xj" - }, - "source": [ - "You can also have a look at the `dingo` related parameters of the model you just built." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "NuVrc64WYJbJ", - "outputId": "44747cd1-43f8-4dc0-8ebb-321218f08714" - }, - "outputs": [], - "source": [ - "model.parameters" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ctpZFRndQGrb" - }, - "source": [ - "### ..to optimize an objective function (FBA)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "uccGKr_hSJCM" - }, - "source": [ - "To perform FBA with `dingo`, you just need to call the according method ([`fba`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/MetabolicNetwork.py#L107)) of the `model` you built using the [`MetabolicNetwork`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/MetabolicNetwork.py#L21) class. \n", - "\n", - "FBA is a lot easier both from the computational and the geometical point of view and it takes much less time. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "kWFbcJumShgh" - }, - "outputs": [], - "source": [ - "%%capture\n", - "fba_output = model.fba()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Ql-cIs8JxLuJ" - }, - "source": [ - "As already mentioned, FBA identifies an **optimal** flux distribution by optimizing a linear **objective function**.\n", - "\n", - "So the output of FBA can be considered as a single sample of the flux sampling method described above. Thus, if we check on the output this time we will see that it has 2 attributes:\n", - "* the **flux vector** (meaning the set of the reactions flux values in the same solution) which maximizes the objective function.\n", - "* the **flux value** that the objective function achieves at the maximum" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Nxk0qI0ISmOB", - "outputId": "3bdaaffe-b7b2-464a-ec83-ad3a086c76d4" - }, - "outputs": [], - "source": [ - "np.set_printoptions(threshold=None)\n", - "\n", - "max_biomass_flux_vector = fba_output[0]\n", - "max_biomass_objective = fba_output[1]\n", - "\n", - "print('This is the optimal flux vector according to the biomass objective function: ', max_biomass_flux_vector)\n", - "print('----------')\n", - "print('This is the value of that optimal solution of the objective function: ', max_biomass_objective)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rKBVhQI9SlbO" - }, - "source": [ - "💪 Even it is quite common that the objective function is the biomass function of a model, **you can change your objective function** as you wish and use any reaction as your objective function.\n", - "\n", - "For example, to maximize the 1st reaction of the model, we first build an array with the number of reactions of our model with `0`s and then we set only its first elemet equal to `1`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "fdLlAbhA0Ylg" - }, - "outputs": [], - "source": [ - "n = model.num_of_reactions()\n", - "obj_fun = np.zeros(n)\n", - "obj_fun[0] = 1" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "jWEhcWmJm1Qu" - }, - "source": [ - "Now we can replace the `objective_function` parameter of our model with our new objective function" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "jrASE_SS2Nst" - }, - "outputs": [], - "source": [ - "model.objective_function = obj_fun" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "YXt-5pFlnCcm" - }, - "source": [ - "And if we now ask to see our new objective function: " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Lf8jpTrU1dEr", - "outputId": "14e6bd8c-4778-4f1b-9ef2-516bc4a9c9e3" - }, - "outputs": [], - "source": [ - "model.objective_function" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "_63UO5TqnMlp" - }, - "source": [ - "You see that the `1` is now not in the `24` index but in the first. 🎉" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "YTd07QyGnZwX" - }, - "source": [ - "So if we run now a FBA here is what we will get. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "f11M0AClnfhZ" - }, - "outputs": [], - "source": [ - "%%capture\n", - "fba_output_new_obj = model.fba()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "80op83f4noZK", - "outputId": "41729b2b-8a84-47f6-82f6-d218fa6040a4" - }, - "outputs": [], - "source": [ - "max_biomass_flux_vector_new_obj = fba_output_new_obj[0]\n", - "max_biomass_objective_new_obj = fba_output_new_obj[1]\n", - "print('This is the optimal flux vector according to my objective function: ', max_biomass_flux_vector_new_obj)\n", - "print('----------')\n", - "print('This is the value of that optimal solution of the new objective function: ', max_biomass_objective_new_obj)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "b1ZAlpHY6Ifm" - }, - "source": [ - "### ..to find the minimum and maximum flux for each reaction (FVA)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MDxY8bCiMcS7" - }, - "source": [ - "Let's run FVA using that; this will take a while 😉" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 - }, - "id": "OaxWPyakpriX", - "outputId": "e673ebb2-a8dc-44de-8b71-a44606100610" - }, - "outputs": [], - "source": [ - "model.reactions[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ex3H0OXtpGr1" - }, - "source": [ - "As we said earlier, FVA gets the min and max value of each reaction flux that can still satisfy the constraints of the metabolic model.\n", - "\n", - "**Remember!** \n", - "\n", - "In the previous step, we replaced the `objective_function` of our model. So if we run FVA in our current model, then the first reaction of our model " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "AK5OdFU4yZZL" - }, - "outputs": [], - "source": [ - "%%capture\n", - "# Run FVA\n", - "fva_output = model.fva()\n", - "\n", - "# Get min and max flux values\n", - "pfk_min_fluxes = fva_output[0]\n", - "pfk_max_fluxes = fva_output[1]\n", - "\n", - "# Get the max flux distribution and the max biomass value when the objective function is maximum\n", - "pfk_max_biomass_flux_vector = fva_output[2]\n", - "pfk_max_biomass_objective = fva_output[3]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "qGVto8ckp7LS" - }, - "source": [ - "We can go with that or if we need to go back in using the actual biomass function of our model, we can either do the same process as before to change again the `objective_function` or we can just build a new instance of the `MetabolicNetwork` class. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Mk-CtSOJLWo8" - }, - "outputs": [], - "source": [ - "%%capture\n", - "# Build new instance of the model\n", - "model = MetabolicNetwork.from_json('ext_data/e_coli_core.json')\n", - "\n", - "# Run FVA\n", - "fva_output = model.fva()\n", - "\n", - "# Get min and max flux values\n", - "min_fluxes = fva_output[0]\n", - "max_fluxes = fva_output[1]\n", - "\n", - "# Get the max flux distribution and the max biomass value when the objective function is maximum\n", - "max_biomass_flux_vector = fva_output[2]\n", - "max_biomass_objective = fva_output[3]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "p81io8bFMcS_" - }, - "source": [ - "Let's see how they're different!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "XYIkjxGxMcS_", - "outputId": "6b75b7b1-d749-4a8b-9ab0-6d01ce7b7f00" - }, - "outputs": [], - "source": [ - "print('This is the flux vector with the minimum values of each flux when the PFK reaction gets maximum: ', pfk_min_fluxes)\n", - "print('This is the flux vector with the minimum values of each flux when the biomass function gets maximum: ', min_fluxes)\n", - "\n", - "print('----------')\n", - "\n", - "print('This is the flux vector with the maximum values of each flux when the PFK reaction gets maximum: ', pfk_max_fluxes)\n", - "print('This is the flux vector with the maximum values of each flux when the biomass function gets maximum: ', max_fluxes)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "fF9wB7y6McS_" - }, - "source": [ - "Apparently (and obviously) they are quite different!\n", - "\n", - "The same thing is with the flux vector and the max biomass objective." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Bzol1NZmTAco" - }, - "source": [ - "### ..to sample the flux space of your model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gH3q_ImwT8H_" - }, - "source": [ - "Finally, one may sample the model's flux space using the [`PolytopeSampler`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/PolytopeSampler.py#L28) class of `dingo`. \n", - "\n", - "To do so, you first build an instance of this class based on your `model`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "HHXSrEelXh4o" - }, - "outputs": [], - "source": [ - "from dingo import PolytopeSampler\n", - "sampler = PolytopeSampler(model)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TqJZftR1kMgC" - }, - "source": [ - "Then, the [`generate_steady_states()`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/PolytopeSampler.py#L119) method can be used for flux sampling. \n", - "\n", - "This method optionally gets as arguments the following: \n", - "\n", - "* the **effective sample size (ESS)** asked (by default `ESS = 1000`). ESS is the number of effectively independent draws from the posterior distribution that the Markov chain is equivalent to.\n", - " \n", - "* the **Potential Scale Reduction Factor (PSRF)** of each marginal flux to get an upper bound equal to 1.1 (by default that is set as `False`)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "xo0P1z7BMcTA" - }, - "source": [ - "So, let's sample!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "0y_TOtcGUmAW" - }, - "outputs": [], - "source": [ - "%%capture\n", - "steady_states = sampler.generate_steady_states(ess = 1000, psrf = True) # this took a little bit more than 5 minutes" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gy25rKbHUsVS" - }, - "source": [ - "Above, a great number of values of the flux of each reaction is computed, given that the system is always at a steady state. \n", - "\n", - "That means that we now have a **flux distribution** (all the values computed for the flux of a specific reaction) for each reaction of our model!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "P-qbALhEY7lu" - }, - "source": [ - "Obviously, we have as many flux distributions as the reactions of our model:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "5F1MNZZHZCU0", - "outputId": "356fe326-1342-4127-fc56-00afede234b9" - }, - "outputs": [], - "source": [ - "print('Number of reactions in my model: ', len(model.reactions))\n", - "print('Number of flux distributions sampled: ', steady_states.shape[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MOG6YGueZzpO" - }, - "source": [ - "And each flux distribution has as many flux values as `dingo` required to reach the ESS asked. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "767f9ouRagU5", - "outputId": "e330ec6b-4c57-4294-f81b-84b17e73f09e" - }, - "outputs": [], - "source": [ - "print('Number of flux values of the flux distributions returned: ', steady_states.shape[1])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "svc47hxNbo6L" - }, - "source": [ - "Let us now assume that we are interested in a certain reaction of our model, for example the last reaction of the glycolysis which is catalyzed by the Pyruvate kinase (PYK) enzyme. \n", - "\n", - "We first need to look in the `model.reactions` list and find the index of the `PYK` reaction; the enzyme is usually the name of reaction. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "-im2Cgq5b3oS", - "outputId": "f0f9948a-99ef-46de-d05a-e21e49164011" - }, - "outputs": [], - "source": [ - "model.reactions.index('PYK')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "k6uheDpHfiHq" - }, - "source": [ - "Now that we know the index of the reaction of our interest, we can use the [`plot_histogram`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/utils.py#L177) function of `dingo` to plot its corresponding flux distribution. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 467 - }, - "id": "-n2HKr82yKyh", - "outputId": "2d924fe9-975b-482c-9da7-016c002de61c" - }, - "outputs": [], - "source": [ - "from dingo import plot_histogram\n", - "\n", - "reactions = model.reactions\n", - "plot_histogram(\n", - " steady_states[23], # here we set which reaction's flux we need to get \n", - " reactions[23], # here we provide the name of the reaction\n", - " n_bins = 60,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "jl7tJ-ObS6L3" - }, - "source": [ - "From the flux distribution of PYK we can that most values fall between 1.7 and 1.9 and that it looks approximately normal, with a mean value at 1.775 mmol/gDW/h (millimoles per gram dry weight per hour).\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 617 - }, - "id": "pQskY7EeWPdv", - "outputId": "715833d7-fc8f-452e-e0b2-3fdcf2532848" - }, - "outputs": [], - "source": [ - "from dingo.illustrations import plot_copula\n", - "\n", - "reaction_1 = [steady_states[23], reactions[23]]\n", - "reaction_2 = [steady_states[21], reactions[21]]\n", - "\n", - "plot_copula(reaction_1, reaction_2, n = 5, width = 900 , height = 600, export_format = \"svg\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "IFsATtLyl6ak" - }, - "source": [ - "Applications of flux sampling can provide great insight in many ways. \n", - "\n", - "Based on the [iAB-AMØ-1410](https://doi.org/10.1038/msb.2010.68) genome-scale metabolic model of human alveolar macrophages and the SARS-CoV-2 virus biomass objective function (VBOF) they built, [Renz et al. (2020)](https://doi.org/10.1093/bioinformatics/btaa813) investigated for alterations in the metabolism to come up with potential antiviral targets. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "EgImVdw8sFk_" - }, - "source": [ - "In line with the findings of Renz et al. (2020), flux sampling in the metabolic model built from the macrophage model and the VBOF using `dingo` indicates **guanylate kinase (GK1)** as a potential antiviral target. \n", - "Flux sampling was perfored first with no objective function for the virus and then after maximizing the VBOF built by Renz et al. (2020). \n", - "\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "v81QKmL3McTF" - }, - "source": [ - "\n", - "------\n", - "\n", - "We hope you will find dingo useful and easy-to-use!\n", - "\n", - "For anything which might be needed, don't hesitate to open an issue on the project's GitHub repo!\n", - "\n", - "\n", - "---" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "RHE9TmXCMcTF" - }, - "source": [ - "## References" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "jYNJd6K9qgHB" - }, - "source": [ - "* Palsson Bernhard. [Systems biology](https://www.cambridge.org/core/books/systems-biology/7F8445BC87019806B3625DFC4B5C27D4). Cambridge university press, 2015. \n", - "\n", - "* Orth Jeffrey D., Ines Thiele, and Bernhard Ø. Palsson. [\"What is flux balance analysis?.\"](https://doi.org/10.1038/nbt.1614) Nature biotechnology 28.3 (2010): 245-248.\n", - "\n", - "* Chalkis Apostolos, Fisikopoulos Vissarion, Tsigaridas Elias and Zafeiropoulos Haris [\"Geometric Algorithms for Sampling the Flux Space of Metabolic Networks\"](https://drops.dagstuhl.de/opus/volltexte/2021/13820/pdf/LIPIcs-SoCG-2021-21.pdf)\n", - "\n", - "* Schellenberger, Jan, and Bernhard Ø. Palsson. [\"Use of randomized sampling for analysis of metabolic networks.\"](https://doi.org/10.1074/jbc.R800048200) Journal of biological chemistry 284, no. 9 (2009): 5457-5461." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dpn2LsGIIpTV" - }, - "source": [ - "## The GeomScale project" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "qjgmQMNcItsV" - }, - "source": [ - "[Geomscale](https://geomscale.github.io/) is a research and development \n", - "project that delivers open source code for state-of-the-art algorithms at the intersection of data science, optimization, geometric, and statistical computing. \n", - "\n", - "`dingo` is based on the [`volesti` package](https://github.com/GeomScale/volesti), an open source C++ library for volume approximation and sampling of convex bodies.\n", - "\n", - "GeomScale is part of [NumFOCUS](https://numfocus.org/) and is a Google Summer of Code [(GSoC) organization](https://summerofcode.withgoogle.com/organizations/5553085268623360/). \n", - "\n", - "
\n", - "\n", - "" - ] - } - ], - "metadata": { + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "XsGorJWn4Q6U" + }, + "source": [ + "# `dingo` : A python library for metabolic networks sampling and analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "04-ezSozK3QT" + }, + "source": [ + "\n", + "## About this Colab notebook " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **Running locally?** If you are running this notebook outside of Google Colab ", + "(e.g., in Jupyter Lab or VS Code), make sure to select the Python kernel/environment ", + "where `dingo` and its dependencies are installed. You can skip the Colab-specific ", + "installation cells (`!pip install ...`, `%%capture`) if `dingo` is already set up in ", + "your environment. See the [tutorials README](README.md) for more details." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-IBI5DoLMDCn" + }, + "source": [ + "The aim of this notebook is to highlight the importance of metabolic networks analysis and the insight it may provide, as well as to demonstrate how to use the `dingo` Python library for the analysis of metabolic networks. \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L-pK94S-GPYG" + }, + "source": [ + "For your convenience, we suggest making a copy of the notebook on your Google Drive \n", + "\n", + "(`File > Save a copy in Drive`). \n", + "\n", + "This way you can change things on the notebook and make your own experiments as you like!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zUpwu6ue1MEG" + }, + "source": [ + "* The `&> /dev/null` in the end of a command is so you do not see all the messages returned when the command is executed.\n", + "* The `%%capture` flag is so Google Colab does not print the output of the chunk of code in a cell. We have used it in some cases to avoid long warning messages. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8CKAQaoN1ORJ" + }, + "source": [ + "In case an error comes up when running a notebook's command, you may click \n", + "\n", + "`Runtime > Factory reset runtime` \n", + "\n", + "and run the notebook from scratch." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7CwAe0jxD5aB" + }, + "source": [ + "## Introduction" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nzzDw-EJKZpy" + }, + "source": [ + "### Genome scale metabolic network reconstruction" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UdP-wtlwHdpZ" + }, + "source": [ + "\ud83e\uddec \ud83e\uddec Over the last decades, vast amounts of data have been gathered mostly thanks to the High Throughput technologies developed. \n", + "\n", + "As a result, the complete sequencing of a bacterial (strain) **genome** is now considered trivial. \n", + "\n", + "Global efforts have lead to large databases supporting the **annotation** of the genes found in a genome." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H2UE0ORSS_sQ" + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6ZG3nbgBN34k" + }, + "source": [ + "From the genome annotation we can now get the **enzymes** and therefore, the **reactions** that potentially take place in our strain (along with their stoichiometry)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZwJEvcTfY_fC" + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hKPc6n9Xd5PK" + }, + "source": [ + "And from the sum of the reactions present, along with their corresponding constraints, we can now get the metabolic network of the species. \n", + "Thus, we have now software tools that enable automatic genome-scale reconstructions (e.g. [carveme](https://carveme.readthedocs.io/en/latest/))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QWnL4ZQxOcTb" + }, + "source": [ + "The corresponding metabolic model is the *mathematical representation* of what happens in the organism along with the interactions between it and its environment. Here is a toy example:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l7YlhJjSPFRo" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8V56ZZCY7aic" + }, + "source": [ + "And here is what metabolic networks allow us to see! " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { "colab": { - "provenance": [], - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3.10.6 64-bit", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - }, - "vscode": { - "interpreter": { - "hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a" - } - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} + "base_uri": "https://localhost:8080/", + "height": 435 + }, + "id": "G5m2-7t7fUkf", + "outputId": "a053a2c0-967c-4580-94f7-2d9ed20061e3" + }, + "outputs": [], + "source": [ + "#@title Pathway reactions fluxes representation{ display-mode: \"form\" }\n", + "from IPython.display import HTML\n", + "HTML('')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nKY2gxKcf_zg" + }, + "source": [ + "This GIF is from a [presentation](https://gibbons-lab.github.io/isb_course_2020/micom/) from Christian Diener at Gibbons Lab!\n", + " \ud83e\udd13 " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DrikcKuf7hjt" + }, + "source": [ + "### Metabolic networks analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N5elFWq-7q6p" + }, + "source": [ + "Now that we do have a metabolic model, the question is *how could someone investigate its properties?*" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v8cgveHyHdmq" + }, + "source": [ + "So, from each metabolic model we can get a stoichiometric matrix ($S$) of the reactions that take place in the organism along with their corresponding constraints. \n", + "As you can guess, $S$ is nothing but a matrix with $m$ metabolites and $n$ reactions!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e4rCPCwghY_6" + }, + "source": [ + "$S_{m,n} =\n", + " \\begin{pmatrix}\n", + " a_{1,1} & a_{1,2} & \\cdots & a_{1,n} \\\\\n", + " a_{2,1} & a_{2,2} & \\cdots & a_{2,n} \\\\\n", + " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", + " a_{m,1} & a_{m,2} & \\cdots & a_{m,n}\n", + " \\end{pmatrix}$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BSXWJ5U8hdUK" + }, + "source": [ + "Where each line is a **metabolite** and each column a **reaction**. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K5-JGM6jeend" + }, + "source": [ + "Considering our system is at steady state, the flux through each reaction is given by the following equation: \n", + "\n", + "$Sv = 0$ \n", + "\n", + "which defines a system of linear equations. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y76vXL78oEKP" + }, + "source": [ + "The solution space for that system is a **polytope**; a geometric object with \"flat\" sides that lives in a large number of dimensions. \n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n6NZrAVttexG" + }, + "source": [ + "**Flux Balance Analysis (FBA)** is commonly used to identify an optimal flux distribution by optimizing a linear objective function; e.g the biomass value of the organism. You may see more about FBA [here](https://www.nature.com/articles/nbt.1614/).\n", + "\n", + "\n", + "\n", + "\n", + "Figure from: [Libourel, I. G., & Shachar-Hill, Y.. Annu. Rev. Plant Biol. 59 (2008)](10.1146/annurev.arplant.58.032806.103822)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ywwrs7BOtkYY" + }, + "source": [ + "**Flux Variability Analysis (FVA)** evaluates the minimum and maximum range of each reaction flux that can still satisfy the constraints of the metabolic model. \n", + "\n", + "To this end FVA uses a double linear programming problem, a maximization and a subsequent minimization for each reaction.\n", + "\n", + "\n", + "\n", + "Figure by a presentation of [Lizbeth Hampton](https://slideplayer.com/slide/13509212/)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3CvfTpQw44FT" + }, + "source": [ + "Both FBA and FVA are biased methods as they require an objective function and they return either a single or a pair of flux values respectively. \n", + "**Flux sampling** on the other hand is an unbiased method for metabolic network analysis, providing a thorough description of the potential values a flux might get, along with the probability to get it. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "meBkuCBUogRJ" + }, + "source": [ + "Our goal is to perform **uniform sampling** on this solution space (also called **flux space**) to gain an unbiased overview of the model's global features but also to investigate how the model responses in various situations. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CRPoqy2jpvIa" + }, + "source": [ + "However, sampling can be a rather challenging task both from the computational and the geometical point of view. \n", + "\n", + "\ud83d\udcaa To this end, `dingo` makes use of the [Multiphase Monte Carlo Sampling algorithm (MMCS)](https://drops.dagstuhl.de/opus/volltexte/2021/13798/pdf/lipics-vol189-socg2021-complete.pdf) developed by the [GeomScale Org.](https://geomscale.github.io/) too. \n", + "\n", + "\ud83d\udcaa Our method splits sampling in phases and makes use of **rounding** steps; this way it is both efficient and returns high quality samples. \n", + "\n", + "\n", + "\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2XOkRFcXsm6u" + }, + "source": [ + "In addition, the MMCS algorithm uses an optimized variant of Billiard Walk with faster arithmetic complexity per step. \n", + "\n", + "Here is an example of how a new point is found. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WX5_VagweffU" + }, + "source": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0pGWGH9Yefhx" + }, + "source": [ + "Finally, the modular design of the method enables the use of any random walk.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "akAjYEubCrTY" + }, + "source": [ + "-------------------------------------------------" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PqWGwdr9Cs-H" + }, + "source": [ + "## Get and install *dingo*" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "164yhNEXMcSi" + }, + "source": [ + "\n", + "### Requirements" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BpZ3fcqKMcSj" + }, + "source": [ + "Mandatory:\n", + "* a Unix-like environment, e.g Linux, MacOS etc.\n", + "* a computer machine with at least 4GB of RAM; `dingo` requires about 4GB to compile \n", + "\n", + "Optional:\n", + "* for better performance, you may use `dingo` with the [`gurobi`](https://www.gurobi.com) solver. To do so, a license is required (free for academic purposes)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tTvoGY3DMcSj" + }, + "source": [ + "You can get the `gurobipy` Python interface of the `gurobi` solver by running:\n", + "\n", + "`\n", + "pip3 install -i https://pypi.gurobi.com gurobipy\n", + "`\n", + "\n", + "and download an academic license [from here](https://www.gurobi.com/downloads/end-user-license-agreement-academic/). \n", + "\n", + "For more information on how to get `gurobi` you may also have a look [here](https://support.gurobi.com/hc/en-us/articles/360044290292-How-do-I-install-Gurobi-for-Python-)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UkdSbnJQwJSm" + }, + "source": [ + "> **Remember!**
\n", + "When working on a normal console (terminal) you will have to remove any `!` and `%` characters from the beginning of the following commands; these allow Google Colab to work as a terminal console. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IQL-qaW4McSk" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dqa089jFDXaH" + }, + "source": [ + "Install the dependencies for PySPQR library; for Debian/Ubuntu Linux, run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0eBtFIaIzZ66" + }, + "outputs": [], + "source": [ + "!apt-get install libsuitesparse-dev &> /dev/null\n", + "%pip install sparseqr &> /dev/null\n", + "%pip install Cython &> /dev/null\n", + "%pip install cobra &> /dev/null\n", + "%pip install kaleido &> /dev/null\n", + "%pip install pyoptinterface[highs] &> /dev/null" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yYqOnx7_4ZRN" + }, + "source": [ + "Get the `dingo` repo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "A6Mcqrqt3_Sa", + "outputId": "e340135f-48c2-4e34-87e6-fa0c54e3024e" + }, + "outputs": [], + "source": [ + "!git clone https://github.com/GeomScale/dingo.git" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qePKvNyVFrHG" + }, + "source": [ + "To load the submodules that `dingo` uses, run:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "n-zpDmAy9V-8", + "outputId": "97e759a1-99ed-4a5b-8953-6113d3b5c315" + }, + "outputs": [], + "source": [ + "%cd dingo/\n", + "!git submodule update --init" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jb6XxVcbF0h3" + }, + "source": [ + "Then, download and unzip the `boost` library:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vkjI-YD69dkw" + }, + "outputs": [], + "source": [ + "!wget -O boost_1_76_0.tar.bz2 https://archives.boost.io/release/1.76.0/source/boost_1_76_0.tar.bz2 &> /dev/null\n", + "!tar xjf boost_1_76_0.tar.bz2 &> /dev/null\n", + "!rm boost_1_76_0.tar.bz2 &> /dev/null" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RK_xgnsYAfOr" + }, + "source": [ + "Install `dingo`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Gu9E7fidgjnW" + }, + "source": [ + "The following command will install `dingo` and it will take a while to compile. \n", + "\n", + "Remember that about 4GB of RAM are required to do so. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hMNc6tbiwhG1", + "outputId": "59572b1f-f2f2-4579-ce74-e43382864040" + }, + "outputs": [], + "source": [ + "!python setup.py install --user &> /dev/null\n", + "print('dingo is ready to go!')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZJ3AW6QTMcSn" + }, + "source": [ + "### Run tests" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PLCBCsYiGEXz" + }, + "source": [ + "We have made some tests for you so you can make sure `dingo` has been installed properly.\n", + "\n", + "You may have a look at these [test files](https://github.com/GeomScale/dingo/tree/develop/tests) so you can run `dingo` from your terminal." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f7q8A5TDxtOS" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!python -m unittest discover tests \"*.py\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rWJ00dGySfol" + }, + "source": [ + "If you look at the output of that command, you should see that every test passed.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iwZEvB6tC6jc" + }, + "source": [ + "--------------------------------------------------------------------------" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rLfOQgaQuULC" + }, + "source": [ + "The following command is to make Colab have access to the `dingo` library that was just compiled. \n", + "There will probably be **no need for you to do this**. \n", + "In case Python does not find `dingo`, just check the \n", + "semifinal line of the compilation step, in this case:\n", + "\n", + "`Writing /root/.local/lib/python3.7/site-packages/dingo-0.1.0.egg-info`\n", + "\n", + "This is the path where `dingo` was compiled. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ojsTmKVZYocZ" + }, + "outputs": [], + "source": [ + "import sys\n", + "_ = (sys.path.append(\"/root/.local/lib/python3.8/site-packages/\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ua5lGNc8C8QE" + }, + "source": [ + "-------------------------------------------------------------------------------" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2dtcobmYQOWp" + }, + "source": [ + "## Run `dingo`.." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OKZhVhU5McSp" + }, + "source": [ + "### ..to manipulate your model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ljInPZvPMcSp" + }, + "source": [ + "Get the path to the `dingo` directory. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "S7ieYXqfMcSq", + "outputId": "d9ec3348-6c5a-46b5-b815-15238625b216" + }, + "outputs": [], + "source": [ + "import os\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AiFwL4lRugVC" + }, + "source": [ + "And import the `MetabolicNetwork` class of `dingo`! " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "FahzUpQFommK" + }, + "outputs": [], + "source": [ + "from dingo import MetabolicNetwork" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AzFd-qTWryBW" + }, + "source": [ + "Now, you can load a metabolic model. \n", + "Currently, `dingo` may get either a `.json` or a '.mat` model format." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NNSJM_Z3McSs" + }, + "outputs": [], + "source": [ + "model = MetabolicNetwork.from_json('ext_data/e_coli_core.json')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8BU8YYL9PC-z" + }, + "source": [ + "Let's see what the model comprises. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jxLug5xuPMcB" + }, + "source": [ + "First of all, let's see the metabolites ($m$) included in our model. \n", + "Here is the first 10 of them. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "_1r63O_AXzTO", + "outputId": "d54ee08c-7aae-4d4b-9aa7-8090bcdd2fb5" + }, + "outputs": [], + "source": [ + "print('Number of metabolites in my model: ', len(model.metabolites))\n", + "print('Here are the first 10 metabolites in your model: ', model.metabolites[:9])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sxaWOTJ-Paoq" + }, + "source": [ + "And now, let us have a look in the reactions ($n$) that occur. \n", + "Again, here is the first 10 reactions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "oEYN_hbeXjpy", + "outputId": "a8dab6dd-9d69-4e35-d98c-3278b55b7f65" + }, + "outputs": [], + "source": [ + "print('Number of reactions in my model: ', len(model.reactions))\n", + "print('Here are the first 10 reactions in your model: ', model.reactions[:9])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qZk9H2aBPrFj" + }, + "source": [ + "Moreover, let us see the stoichiometrix matrix ($S$) of our model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DurJhomTPzO1", + "outputId": "0d4f107a-7bdb-48ff-e8f8-ee6909476ac2" + }, + "outputs": [], + "source": [ + "model.S" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wHkrYqfTemnW" + }, + "source": [ + "And its dimensions are of course $m*n$." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Tl52cmuaenlY", + "outputId": "256123a8-4ef3-4c3d-a7e3-3298e3066535" + }, + "outputs": [], + "source": [ + "model.S.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nsGLu7c0d3Ra" + }, + "source": [ + "Last but not least, let us have a look at the **biomass** or **objective function** of our model. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hpnbfsBVeCFK", + "outputId": "c7bf618f-eb30-4226-986e-752e15caa544" + }, + "outputs": [], + "source": [ + "model.objective_function" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MLXZt3umeGJm" + }, + "source": [ + "As you can see, this array has only a single `1` in index `24`.\n", + "Let's see which function is this. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "-04jUh1veQNt", + "outputId": "054be7c9-8378-4609-98e8-28e0cac299ef" + }, + "outputs": [], + "source": [ + "model.reactions[24]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dLi2HZWTjS0e" + }, + "source": [ + "Ok, so what exactly is the biomass function? \ud83e\udd14 \n", + "\n", + "Let's print the `24` column of our $S$ matrix to see what we have there." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "1c93OKZ7eUPX", + "outputId": "19169c24-22ae-4e5b-b3c9-054ca0270dd5" + }, + "outputs": [], + "source": [ + "# We import numpy and set the threshold parameter to max to enable the display of all the elements of the array\n", + "np.set_printoptions(threshold=sys.maxsize)\n", + "model.S[:,24]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iJa3YQr7jq2-" + }, + "source": [ + "And just to make sure, let's print the length of this array too. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wFTLZewLiAH-", + "outputId": "eb1b0b32-e025-426d-f59d-670cb416d8bc" + }, + "outputs": [], + "source": [ + "model.S[:,24].shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6TlgVCJijyCO" + }, + "source": [ + "As expected, the length of the biomass array equals to the number of the metabolites of our model. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "EjD7uBgfhYu0", + "outputId": "ada5cb8b-0aa5-4292-a177-274b8f150b79" + }, + "outputs": [], + "source": [ + "len(model.metabolites)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Pq0IOSSdkECR" + }, + "source": [ + "Moreover, you need to remember that the indexes of those 2 arrays are in order, meaning that the actual biomass function would be as follows:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oQ6GhgTblIND" + }, + "source": [ + "$ 59.81*adp\\_c - 59.81*atp\\_c -0.0709*f6p\\_c + ... -0.8977*r5p\\_c$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NmVjo47VP8xj" + }, + "source": [ + "You can also have a look at the `dingo` related parameters of the model you just built." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "NuVrc64WYJbJ", + "outputId": "44747cd1-43f8-4dc0-8ebb-321218f08714" + }, + "outputs": [], + "source": [ + "model.parameters" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ctpZFRndQGrb" + }, + "source": [ + "### ..to optimize an objective function (FBA)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uccGKr_hSJCM" + }, + "source": [ + "To perform FBA with `dingo`, you just need to call the according method ([`fba`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/MetabolicNetwork.py#L107)) of the `model` you built using the [`MetabolicNetwork`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/MetabolicNetwork.py#L21) class. \n", + "\n", + "FBA is a lot easier both from the computational and the geometical point of view and it takes much less time. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "kWFbcJumShgh" + }, + "outputs": [], + "source": [ + "%%capture\n", + "fba_output = model.fba()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ql-cIs8JxLuJ" + }, + "source": [ + "As already mentioned, FBA identifies an **optimal** flux distribution by optimizing a linear **objective function**.\n", + "\n", + "So the output of FBA can be considered as a single sample of the flux sampling method described above. Thus, if we check on the output this time we will see that it has 2 attributes:\n", + "* the **flux vector** (meaning the set of the reactions flux values in the same solution) which maximizes the objective function.\n", + "* the **flux value** that the objective function achieves at the maximum" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Nxk0qI0ISmOB", + "outputId": "3bdaaffe-b7b2-464a-ec83-ad3a086c76d4" + }, + "outputs": [], + "source": [ + "np.set_printoptions(threshold=None)\n", + "\n", + "max_biomass_flux_vector = fba_output[0]\n", + "max_biomass_objective = fba_output[1]\n", + "\n", + "print('This is the optimal flux vector according to the biomass objective function: ', max_biomass_flux_vector)\n", + "print('----------')\n", + "print('This is the value of that optimal solution of the objective function: ', max_biomass_objective)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rKBVhQI9SlbO" + }, + "source": [ + "\ud83d\udcaa Even it is quite common that the objective function is the biomass function of a model, **you can change your objective function** as you wish and use any reaction as your objective function.\n", + "\n", + "For example, to maximize the 1st reaction of the model, we first build an array with the number of reactions of our model with `0`s and then we set only its first elemet equal to `1`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fdLlAbhA0Ylg" + }, + "outputs": [], + "source": [ + "n = model.num_of_reactions()\n", + "obj_fun = np.zeros(n)\n", + "obj_fun[0] = 1" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jWEhcWmJm1Qu" + }, + "source": [ + "Now we can replace the `objective_function` parameter of our model with our new objective function" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jrASE_SS2Nst" + }, + "outputs": [], + "source": [ + "model.objective_function = obj_fun" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YXt-5pFlnCcm" + }, + "source": [ + "And if we now ask to see our new objective function: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Lf8jpTrU1dEr", + "outputId": "14e6bd8c-4778-4f1b-9ef2-516bc4a9c9e3" + }, + "outputs": [], + "source": [ + "model.objective_function" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_63UO5TqnMlp" + }, + "source": [ + "You see that the `1` is now not in the `24` index but in the first. \ud83c\udf89" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YTd07QyGnZwX" + }, + "source": [ + "So if we run now a FBA here is what we will get. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f11M0AClnfhZ" + }, + "outputs": [], + "source": [ + "%%capture\n", + "fba_output_new_obj = model.fba()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "80op83f4noZK", + "outputId": "41729b2b-8a84-47f6-82f6-d218fa6040a4" + }, + "outputs": [], + "source": [ + "max_biomass_flux_vector_new_obj = fba_output_new_obj[0]\n", + "max_biomass_objective_new_obj = fba_output_new_obj[1]\n", + "print('This is the optimal flux vector according to my objective function: ', max_biomass_flux_vector_new_obj)\n", + "print('----------')\n", + "print('This is the value of that optimal solution of the new objective function: ', max_biomass_objective_new_obj)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b1ZAlpHY6Ifm" + }, + "source": [ + "### ..to find the minimum and maximum flux for each reaction (FVA)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MDxY8bCiMcS7" + }, + "source": [ + "Let's run FVA using that; this will take a while \ud83d\ude09" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "OaxWPyakpriX", + "outputId": "e673ebb2-a8dc-44de-8b71-a44606100610" + }, + "outputs": [], + "source": [ + "model.reactions[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ex3H0OXtpGr1" + }, + "source": [ + "As we said earlier, FVA gets the min and max value of each reaction flux that can still satisfy the constraints of the metabolic model.\n", + "\n", + "**Remember!** \n", + "\n", + "In the previous step, we replaced the `objective_function` of our model. So if we run FVA in our current model, then the first reaction of our model " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "AK5OdFU4yZZL" + }, + "outputs": [], + "source": [ + "%%capture\n", + "# Run FVA\n", + "fva_output = model.fva()\n", + "\n", + "# Get min and max flux values\n", + "pfk_min_fluxes = fva_output[0]\n", + "pfk_max_fluxes = fva_output[1]\n", + "\n", + "# Get the max flux distribution and the max biomass value when the objective function is maximum\n", + "pfk_max_biomass_flux_vector = fva_output[2]\n", + "pfk_max_biomass_objective = fva_output[3]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qGVto8ckp7LS" + }, + "source": [ + "We can go with that or if we need to go back in using the actual biomass function of our model, we can either do the same process as before to change again the `objective_function` or we can just build a new instance of the `MetabolicNetwork` class. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Mk-CtSOJLWo8" + }, + "outputs": [], + "source": [ + "%%capture\n", + "# Build new instance of the model\n", + "model = MetabolicNetwork.from_json('ext_data/e_coli_core.json')\n", + "\n", + "# Run FVA\n", + "fva_output = model.fva()\n", + "\n", + "# Get min and max flux values\n", + "min_fluxes = fva_output[0]\n", + "max_fluxes = fva_output[1]\n", + "\n", + "# Get the max flux distribution and the max biomass value when the objective function is maximum\n", + "max_biomass_flux_vector = fva_output[2]\n", + "max_biomass_objective = fva_output[3]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p81io8bFMcS_" + }, + "source": [ + "Let's see how they're different!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XYIkjxGxMcS_", + "outputId": "6b75b7b1-d749-4a8b-9ab0-6d01ce7b7f00" + }, + "outputs": [], + "source": [ + "print('This is the flux vector with the minimum values of each flux when the PFK reaction gets maximum: ', pfk_min_fluxes)\n", + "print('This is the flux vector with the minimum values of each flux when the biomass function gets maximum: ', min_fluxes)\n", + "\n", + "print('----------')\n", + "\n", + "print('This is the flux vector with the maximum values of each flux when the PFK reaction gets maximum: ', pfk_max_fluxes)\n", + "print('This is the flux vector with the maximum values of each flux when the biomass function gets maximum: ', max_fluxes)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fF9wB7y6McS_" + }, + "source": [ + "Apparently (and obviously) they are quite different!\n", + "\n", + "The same thing is with the flux vector and the max biomass objective." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bzol1NZmTAco" + }, + "source": [ + "### ..to sample the flux space of your model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gH3q_ImwT8H_" + }, + "source": [ + "Finally, one may sample the model's flux space using the [`PolytopeSampler`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/PolytopeSampler.py#L28) class of `dingo`. \n", + "\n", + "To do so, you first build an instance of this class based on your `model`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HHXSrEelXh4o" + }, + "outputs": [], + "source": [ + "from dingo import PolytopeSampler\n", + "sampler = PolytopeSampler(model)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TqJZftR1kMgC" + }, + "source": [ + "Then, the [`generate_steady_states()`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/PolytopeSampler.py#L119) method can be used for flux sampling. \n", + "\n", + "This method optionally gets as arguments the following: \n", + "\n", + "* the **effective sample size (ESS)** asked (by default `ESS = 1000`). ESS is the number of effectively independent draws from the posterior distribution that the Markov chain is equivalent to.\n", + " \n", + "* the **Potential Scale Reduction Factor (PSRF)** of each marginal flux to get an upper bound equal to 1.1 (by default that is set as `False`)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xo0P1z7BMcTA" + }, + "source": [ + "So, let's sample!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0y_TOtcGUmAW" + }, + "outputs": [], + "source": [ + "%%capture\n", + "steady_states = sampler.generate_steady_states(ess = 1000, psrf = True) # this took a little bit more than 5 minutes" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gy25rKbHUsVS" + }, + "source": [ + "Above, a great number of values of the flux of each reaction is computed, given that the system is always at a steady state. \n", + "\n", + "That means that we now have a **flux distribution** (all the values computed for the flux of a specific reaction) for each reaction of our model!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P-qbALhEY7lu" + }, + "source": [ + "Obviously, we have as many flux distributions as the reactions of our model:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5F1MNZZHZCU0", + "outputId": "356fe326-1342-4127-fc56-00afede234b9" + }, + "outputs": [], + "source": [ + "print('Number of reactions in my model: ', len(model.reactions))\n", + "print('Number of flux distributions sampled: ', steady_states.shape[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MOG6YGueZzpO" + }, + "source": [ + "And each flux distribution has as many flux values as `dingo` required to reach the ESS asked. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "767f9ouRagU5", + "outputId": "e330ec6b-4c57-4294-f81b-84b17e73f09e" + }, + "outputs": [], + "source": [ + "print('Number of flux values of the flux distributions returned: ', steady_states.shape[1])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "svc47hxNbo6L" + }, + "source": [ + "Let us now assume that we are interested in a certain reaction of our model, for example the last reaction of the glycolysis which is catalyzed by the Pyruvate kinase (PYK) enzyme. \n", + "\n", + "We first need to look in the `model.reactions` list and find the index of the `PYK` reaction; the enzyme is usually the name of reaction. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-im2Cgq5b3oS", + "outputId": "f0f9948a-99ef-46de-d05a-e21e49164011" + }, + "outputs": [], + "source": [ + "model.reactions.index('PYK')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "k6uheDpHfiHq" + }, + "source": [ + "Now that we know the index of the reaction of our interest, we can use the [`plot_histogram`](https://github.com/GeomScale/dingo/blob/a76b4be22f33feac86dff38746c7f2706afb2b67/dingo/utils.py#L177) function of `dingo` to plot its corresponding flux distribution. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 467 + }, + "id": "-n2HKr82yKyh", + "outputId": "2d924fe9-975b-482c-9da7-016c002de61c" + }, + "outputs": [], + "source": [ + "from dingo import plot_histogram\n", + "\n", + "reactions = model.reactions\n", + "plot_histogram(\n", + " steady_states[23], # here we set which reaction's flux we need to get \n", + " reactions[23], # here we provide the name of the reaction\n", + " n_bins = 60,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jl7tJ-ObS6L3" + }, + "source": [ + "From the flux distribution of PYK we can that most values fall between 1.7 and 1.9 and that it looks approximately normal, with a mean value at 1.775 mmol/gDW/h (millimoles per gram dry weight per hour).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 617 + }, + "id": "pQskY7EeWPdv", + "outputId": "715833d7-fc8f-452e-e0b2-3fdcf2532848" + }, + "outputs": [], + "source": [ + "from dingo.illustrations import plot_copula\n", + "\n", + "reaction_1 = [steady_states[23], reactions[23]]\n", + "reaction_2 = [steady_states[21], reactions[21]]\n", + "\n", + "plot_copula(reaction_1, reaction_2, n = 5, width = 900 , height = 600, export_format = \"svg\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IFsATtLyl6ak" + }, + "source": [ + "Applications of flux sampling can provide great insight in many ways. \n", + "\n", + "Based on the [iAB-AM\u00d8-1410](https://doi.org/10.1038/msb.2010.68) genome-scale metabolic model of human alveolar macrophages and the SARS-CoV-2 virus biomass objective function (VBOF) they built, [Renz et al. (2020)](https://doi.org/10.1093/bioinformatics/btaa813) investigated for alterations in the metabolism to come up with potential antiviral targets. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EgImVdw8sFk_" + }, + "source": [ + "In line with the findings of Renz et al. (2020), flux sampling in the metabolic model built from the macrophage model and the VBOF using `dingo` indicates **guanylate kinase (GK1)** as a potential antiviral target. \n", + "Flux sampling was perfored first with no objective function for the virus and then after maximizing the VBOF built by Renz et al. (2020). \n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v81QKmL3McTF" + }, + "source": [ + "\n", + "------\n", + "\n", + "We hope you will find dingo useful and easy-to-use!\n", + "\n", + "For anything which might be needed, don't hesitate to open an issue on the project's GitHub repo!\n", + "\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RHE9TmXCMcTF" + }, + "source": [ + "## References" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jYNJd6K9qgHB" + }, + "source": [ + "* Palsson Bernhard. [Systems biology](https://www.cambridge.org/core/books/systems-biology/7F8445BC87019806B3625DFC4B5C27D4). Cambridge university press, 2015. \n", + "\n", + "* Orth Jeffrey D., Ines Thiele, and Bernhard \u00d8. Palsson. [\"What is flux balance analysis?.\"](https://doi.org/10.1038/nbt.1614) Nature biotechnology 28.3 (2010): 245-248.\n", + "\n", + "* Chalkis Apostolos, Fisikopoulos Vissarion, Tsigaridas Elias and Zafeiropoulos Haris [\"Geometric Algorithms for Sampling the Flux Space of Metabolic Networks\"](https://drops.dagstuhl.de/opus/volltexte/2021/13820/pdf/LIPIcs-SoCG-2021-21.pdf)\n", + "\n", + "* Schellenberger, Jan, and Bernhard \u00d8. Palsson. [\"Use of randomized sampling for analysis of metabolic networks.\"](https://doi.org/10.1074/jbc.R800048200) Journal of biological chemistry 284, no. 9 (2009): 5457-5461." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dpn2LsGIIpTV" + }, + "source": [ + "## The GeomScale project" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qjgmQMNcItsV" + }, + "source": [ + "[Geomscale](https://geomscale.github.io/) is a research and development \n", + "project that delivers open source code for state-of-the-art algorithms at the intersection of data science, optimization, geometric, and statistical computing. \n", + "\n", + "`dingo` is based on the [`volesti` package](https://github.com/GeomScale/volesti), an open source C++ library for volume approximation and sampling of convex bodies.\n", + "\n", + "GeomScale is part of [NumFOCUS](https://numfocus.org/) and is a Google Summer of Code [(GSoC) organization](https://summerofcode.withgoogle.com/organizations/5553085268623360/). \n", + "\n", + "
\n", + "\n", + "" + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3.10.6 64-bit", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + }, + "vscode": { + "interpreter": { + "hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a" + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file