Technical Appendix
Data provenance, algorithms, equations, and known limitations. Written to the standard of a scientific paper's Methods section.
1. Data Provenance
Every external data source used by Molara, including retrieval method, version, and caching strategy.
| Source | URL / Version | Data Retrieved | Access | Cache / Rate |
|---|---|---|---|---|
| PubChem (NIH/NLM) | pubchem.ncbi.nlm.nih.gov/rest/pug | CID, molecular formula, MW, IUPAC name, canonical SMILES, XLogP, TPSA, HBD, HBA, description, 3D SDF coordinates | REST API (no key) | 5 req/sec; 5-min TTL |
| RDKit | rdkit.org ≥ 2024.3.5 | MW, LogP (Wildman-Crippen), TPSA, HBD, HBA, rotatable bonds, ring count, heavy atoms, Lipinski evaluation, 3D conformer generation | Local Python library | N/A |
| DDInter 2.0 | ddinter2.scbdd.com (2024) | Drug-drug interaction severity (Major / Moderate / Minor) for ~302,516 pairs across ~2,310 drugs | Local SQLite (pre-loaded from 8 ATC CSV files) | N/A (local) |
| OpenFDA | api.fda.gov/drug/label.json | Drug interaction warning text from FDA-approved drug labels | REST API (optional key) | 1,000/day; 10-min TTL |
| RCSB PDB | files.rcsb.org/download | Protein-drug co-crystal PDB coordinate files (5 curated pairs: COX-2, A2A, Penicillin Acylase, Neuraminidase, DHFR) | REST download (no key) | 10-min TTL |
| 3Dmol.js | 3dmol.csb.pitt.edu v2.x | Client-side WebGL molecular visualization (stick, sphere, cartoon render modes) | npm package | N/A |
| AI Language Model | LLM | AI-generated pharmacology Q&A; system prompt enriched with current molecule context | REST API (key required) | 1,024 tokens/response |
2. Algorithms & Equations
Every computation performed by Molara, documented with the exact formulas and methods used.
2.1 Molecular Property Calculation
Molecular properties are computed from a canonical SMILES string using the RDKit cheminformatics library. PubChem provides the SMILES; RDKit recalculates selected descriptors server-side for consistency.
| Property | RDKit Method | Description |
|---|---|---|
| Molecular Weight | Descriptors.MolWt() | Sum of average atomic masses (Da) |
| LogP | Descriptors.MolLogP() | Wildman-Crippen octanol-water partition coefficient |
| TPSA | Descriptors.TPSA() | Topological polar surface area (Ericsson method, Ų) |
| H-Bond Donors | Descriptors.NumHDonors() | Count of N–H and O–H groups |
| H-Bond Acceptors | Descriptors.NumHAcceptors() | Count of N and O atoms |
| Rotatable Bonds | Descriptors.NumRotatableBonds() | Non-terminal, non-ring single bonds |
| Ring Count | Descriptors.RingCount() | Total number of ring systems (SSSR) |
2.2 3D Structure Generation
When a pre-computed 3D structure is unavailable from PubChem, Molara generates one from the SMILES string using a two-step process:
- Embedding — Explicit hydrogens are added, then a 3D conformer is generated with RDKit's
ETKDGv3(Experimental Torsion Knowledge Distance Geometry v3) algorithm using a fixed random seed of 42 for reproducibility. - Optimization — The conformer is energy-minimized with the MMFF (Merck Molecular Force Field) for up to 500 iterations. The result is exported as an SDF mol block.
Pipeline: SMILES → Chem.MolFromSmiles() → Chem.AddHs() → AllChem.EmbedMolecule(ETKDGv3, seed=42) → AllChem.MMFFOptimizeMolecule(maxIters=500) → SDF
2.3 Lipinski's Rule of Five
Lipinski's Rule of Five (Lipinski et al., 1997) predicts whether a compound is likely to be orally bioavailable. A molecule passes if all four criteria are satisfied:
Where MW is molecular weight, LogP is the Wildman-Crippen partition coefficient, HBD is the count of hydrogen-bond donors (N–H, O–H), and HBA is the count of hydrogen-bond acceptors (N, O). Violations are counted and displayed alongside each rule's pass/fail status.
2.4 Pharmacokinetic Simulation
Molara uses a one-compartment oral dosing model with first-order absorption and first-order elimination. The body is modeled as a single, well-mixed compartment.
| Symbol | Parameter | Unit |
|---|---|---|
| F | Oral bioavailability | fraction (0–1) |
| D | Dose | mg |
| Vd | Volume of distribution | L |
| ka | Absorption rate constant | hr¹ |
| ke | Elimination rate constant | hr¹ |
Plasma concentration at time t
Time to maximum concentration
Area under the curve (AUC)
Elimination half-life
The simulation generates 200 evenly-spaced time points over a default window of 24 hours. Cmax is evaluated at tmax. When ka = ke, the special-case L'Hôpital form C(t) = (F · D · ka · t · e−ket) / Vd is used, and tmax = 1/ke.
2.5 Drug Interaction Detection
Drug-drug interactions are detected using a two-source strategy that combines a local database with live FDA label queries.
Source A — DDInter 2.0 (local)
A pre-loaded SQLite database containing ~302,516 drug-drug interaction pairs from 8 ATC classification categories (A, B, D, H, L, P, R, V). Each pair has a severity classification: Major, Moderate, or Minor. Lookups are bidirectional and case-insensitive. 24 common brand names (e.g., Tylenol → Acetaminophen, Advil → Ibuprofen) are resolved via an alias map.
Source B — OpenFDA (live)
The FDA Drug Label API is queried in real time for each drug using the openfda.generic_name field (with fallback to openfda.brand_name). The drug_interactions section of the label is extracted and displayed as free-text interaction information. Results are cached for 10 minutes.
3. Known Limitations
Molara is an educational tool, not a clinical decision support system. Users should be aware of the following constraints.
Molecular Data
- •RDKit-generated 3D coordinates are energy-minimized approximations (MMFF force field), not experimentally determined crystal structures.
- •PubChem molecular properties are computationally derived; experimental values may differ, especially for LogP and TPSA.
- •SMILES-based analysis does not distinguish stereoisomers (R/S, E/Z) unless the SMILES explicitly encodes chirality.
Pharmacokinetics
- •The one-compartment model assumes instantaneous and uniform distribution to all tissues. Multi-compartment behavior (e.g., CNS penetration, adipose sequestration) is not captured.
- •Does not account for plasma protein binding, active metabolites, enterohepatic recirculation, renal/hepatic impairment, or drug-drug PK interactions.
- •Parameters reflect population-level averages. Individual pharmacokinetic variability (age, weight, genetics) is not modeled.
Drug Interactions
- •DDInter 2.0 covers approximately 2,310 drugs. Interactions involving newer, biosimilar, or less common agents may not be present.
- •Severity classifications (Major / Moderate / Minor) are categorical and do not capture dose-dependent or patient-specific risk.
- •OpenFDA label text reflects US FDA-approved labeling only. Regional regulatory differences are not represented.
AI Assistant
- •AI responses are generated by a large language model and should never be used for clinical decision-making. Always consult a healthcare professional.
- •The model may produce pharmacologically plausible but factually incorrect statements (hallucinations), particularly for rare drugs or novel research.
- •A training data cutoff applies; the newest approved drugs or recent safety warnings may not be reflected.
4. Attribution
- Drug interaction data from DDInter 2.0 (Xiong et al., Nucleic Acids Research, 2022)
- Molecular data from PubChem PUG-REST API (National Library of Medicine, NIH)
- Protein structures from RCSB Protein Data Bank (Berman et al., 2000)
- RDKit: Open-source cheminformatics (rdkit.org)
- 3D conformer generation via ETKDGv3 (Riniker & Landrum, J. Chem. Inf. Model., 2015)
- 3D visualization powered by 3Dmol.js (Rego & Koes, 2015)
- Lipinski's Rule of Five (Lipinski et al., Adv. Drug Deliv. Rev., 1997)
- AI-powered pharmacology assistant