Technical Appendix

Data provenance, algorithms, equations, and known limitations. Written to the standard of a scientific paper's Methods section.

1. Data Provenance

Every external data source used by Molara, including retrieval method, version, and caching strategy.

Source	URL / Version	Data Retrieved	Access	Cache / Rate
PubChem (NIH/NLM)	pubchem.ncbi.nlm.nih.gov/rest/pug	CID, molecular formula, MW, IUPAC name, canonical SMILES, XLogP, TPSA, HBD, HBA, description, 3D SDF coordinates	REST API (no key)	5 req/sec; 5-min TTL
RDKit	rdkit.org ≥ 2024.3.5	MW, LogP (Wildman-Crippen), TPSA, HBD, HBA, rotatable bonds, ring count, heavy atoms, Lipinski evaluation, 3D conformer generation	Local Python library	N/A
DDInter 2.0	ddinter2.scbdd.com (2024)	Drug-drug interaction severity (Major / Moderate / Minor) for ~302,516 pairs across ~2,310 drugs	Local SQLite (pre-loaded from 8 ATC CSV files)	N/A (local)
OpenFDA	api.fda.gov/drug/label.json	Drug interaction warning text from FDA-approved drug labels	REST API (optional key)	1,000/day; 10-min TTL
RCSB PDB	files.rcsb.org/download	Protein-drug co-crystal PDB coordinate files (5 curated pairs: COX-2, A2A, Penicillin Acylase, Neuraminidase, DHFR)	REST download (no key)	10-min TTL
3Dmol.js	3dmol.csb.pitt.edu v2.x	Client-side WebGL molecular visualization (stick, sphere, cartoon render modes)	npm package	N/A
AI Language Model	LLM	AI-generated pharmacology Q&A; system prompt enriched with current molecule context	REST API (key required)	1,024 tokens/response

2. Algorithms & Equations

Every computation performed by Molara, documented with the exact formulas and methods used.

2.1 Molecular Property Calculation

Molecular properties are computed from a canonical SMILES string using the RDKit cheminformatics library. PubChem provides the SMILES; RDKit recalculates selected descriptors server-side for consistency.

Property	RDKit Method	Description
Molecular Weight	`Descriptors.MolWt()`	Sum of average atomic masses (Da)
LogP	`Descriptors.MolLogP()`	Wildman-Crippen octanol-water partition coefficient
TPSA	`Descriptors.TPSA()`	Topological polar surface area (Ericsson method, Å²)
H-Bond Donors	`Descriptors.NumHDonors()`	Count of N–H and O–H groups
H-Bond Acceptors	`Descriptors.NumHAcceptors()`	Count of N and O atoms
Rotatable Bonds	`Descriptors.NumRotatableBonds()`	Non-terminal, non-ring single bonds
Ring Count	`Descriptors.RingCount()`	Total number of ring systems (SSSR)

2.2 3D Structure Generation

When a pre-computed 3D structure is unavailable from PubChem, Molara generates one from the SMILES string using a two-step process:

Embedding — Explicit hydrogens are added, then a 3D conformer is generated with RDKit's ETKDGv3 (Experimental Torsion Knowledge Distance Geometry v3) algorithm using a fixed random seed of 42 for reproducibility.
Optimization — The conformer is energy-minimized with the MMFF (Merck Molecular Force Field) for up to 500 iterations. The result is exported as an SDF mol block.

Pipeline: SMILES → Chem.MolFromSmiles() → Chem.AddHs() → AllChem.EmbedMolecule(ETKDGv3, seed=42) → AllChem.MMFFOptimizeMolecule(maxIters=500) → SDF

2.3 Lipinski's Rule of Five

Lipinski's Rule of Five (Lipinski et al., 1997) predicts whether a compound is likely to be orally bioavailable. A molecule passes if all four criteria are satisfied:

\text{MW} \leq 500 \text{ Da} \qquad \log P \leq 5 \qquad \text{HBD} \leq 5 \qquad \text{HBA} \leq 10

Where MW is molecular weight, LogP is the Wildman-Crippen partition coefficient, HBD is the count of hydrogen-bond donors (N–H, O–H), and HBA is the count of hydrogen-bond acceptors (N, O). Violations are counted and displayed alongside each rule's pass/fail status.

2.4 Pharmacokinetic Simulation

Molara uses a one-compartment oral dosing model with first-order absorption and first-order elimination. The body is modeled as a single, well-mixed compartment.

Symbol	Parameter	Unit
F	Oral bioavailability	fraction (0–1)
D	Dose	mg
V_d	Volume of distribution	L
k_a	Absorption rate constant	hr¹
k_e	Elimination rate constant	hr¹

Plasma concentration at time t

C(t) = \frac{F \cdot D \cdot k_a}{V_d \cdot (k_a - k_e)} \left( e^{-k_e t} - e^{-k_a t} \right)

Time to maximum concentration

t_{\max} = \frac{\ln(k_a \,/\, k_e)}{k_a - k_e}

Area under the curve (AUC)

\text{AUC}_{0 \to \infty} = \frac{F \cdot D}{V_d \cdot k_e}

Elimination half-life

t_{1/2} = \frac{\ln 2}{k_e} \approx \frac{0.693}{k_e}

The simulation generates 200 evenly-spaced time points over a default window of 24 hours. C_max is evaluated at t_max. When k_a = k_e, the special-case L'Hôpital form C(t) = (F · D · k_a · t · e^−k_et) / V_d is used, and t_max = 1/k_e.

2.5 Drug Interaction Detection

Drug-drug interactions are detected using a two-source strategy that combines a local database with live FDA label queries.

Source A — DDInter 2.0 (local)

A pre-loaded SQLite database containing ~302,516 drug-drug interaction pairs from 8 ATC classification categories (A, B, D, H, L, P, R, V). Each pair has a severity classification: Major, Moderate, or Minor. Lookups are bidirectional and case-insensitive. 24 common brand names (e.g., Tylenol → Acetaminophen, Advil → Ibuprofen) are resolved via an alias map.

Source B — OpenFDA (live)

The FDA Drug Label API is queried in real time for each drug using the openfda.generic_name field (with fallback to openfda.brand_name). The drug_interactions section of the label is extracted and displayed as free-text interaction information. Results are cached for 10 minutes.

3. Known Limitations

Molara is an educational tool, not a clinical decision support system. Users should be aware of the following constraints.

Molecular Data

•RDKit-generated 3D coordinates are energy-minimized approximations (MMFF force field), not experimentally determined crystal structures.
•PubChem molecular properties are computationally derived; experimental values may differ, especially for LogP and TPSA.
•SMILES-based analysis does not distinguish stereoisomers (R/S, E/Z) unless the SMILES explicitly encodes chirality.

Pharmacokinetics

•The one-compartment model assumes instantaneous and uniform distribution to all tissues. Multi-compartment behavior (e.g., CNS penetration, adipose sequestration) is not captured.
•Does not account for plasma protein binding, active metabolites, enterohepatic recirculation, renal/hepatic impairment, or drug-drug PK interactions.
•Parameters reflect population-level averages. Individual pharmacokinetic variability (age, weight, genetics) is not modeled.

Drug Interactions

•DDInter 2.0 covers approximately 2,310 drugs. Interactions involving newer, biosimilar, or less common agents may not be present.
•Severity classifications (Major / Moderate / Minor) are categorical and do not capture dose-dependent or patient-specific risk.
•OpenFDA label text reflects US FDA-approved labeling only. Regional regulatory differences are not represented.

AI Assistant

•AI responses are generated by a large language model and should never be used for clinical decision-making. Always consult a healthcare professional.
•The model may produce pharmacologically plausible but factually incorrect statements (hallucinations), particularly for rare drugs or novel research.
•A training data cutoff applies; the newest approved drugs or recent safety warnings may not be reflected.

4. Attribution

Drug interaction data from DDInter 2.0 (Xiong et al., Nucleic Acids Research, 2022)
Molecular data from PubChem PUG-REST API (National Library of Medicine, NIH)
Protein structures from RCSB Protein Data Bank (Berman et al., 2000)
RDKit: Open-source cheminformatics (rdkit.org)
3D conformer generation via ETKDGv3 (Riniker & Landrum, J. Chem. Inf. Model., 2015)
3D visualization powered by 3Dmol.js (Rego & Koes, 2015)
Lipinski's Rule of Five (Lipinski et al., Adv. Drug Deliv. Rev., 1997)
AI-powered pharmacology assistant