21/08/2023
Built an LLM-driven QA/anomaly-detection pipeline in Databricks (PySpark + SQL) to validate 2.7M+ gas flow aggregates (2005–present) across 400+ US/Mexico/Canada pipelines at component and throughput-group levels, surfacing root causes through tabular anomaly outputs (multiple anomaly types), visuals, and plain-text GPT-4 reports - cutting manual QA ~85%, speeding detection ~15x, saving ~100 hrs/month, and helping close ~90% of issues cross-functionally by shifting work from investigation to scalable, automated detection and faster resolution.
Developed ML dashboards and NLP models to detect gas flow anomalies and automate post-merger data reconciliation, improving accuracy by 95% and cutting processing time by 80%
Developed five key accuracy metrics to evaluate the performance of 20 mainstream oil and gas price forecasts and forward curves, providing critical insights for 23 proprietary hedge funds. Created an interactive Tableau dashboard to visualize price forecasts and accuracy metrics for these datasets, streamlining manual reporting and saving around 5 hours each week.
Built an NLP-based organization entity resolution pipeline to support Named Entity Recognition (NER) workflows by extracting and normalizing company (ORG) entities and deduplicating 40,000+ name variants. Used TF-IDF vectorization and similarity scoring to map abbreviations, legal suffix variations (Inc/LLC/Ltd), omissions, and typos to a single canonical company name for consistent downstream analytics.
Developed an interactive web interface using Python Dash and Plotly to provide insights into global energy trends and predict fossil fuel prices. Applied advanced time series models, including Seasonal ARIMA, STL Decomposition, Exponential Smoothing, and Prophet, to forecast energy prices and present valuable data-driven insights.
Designed an NLP-based sentiment and impact prediction pipeline using tokenization, embeddings, and LSTM neural networks with softmax output to analyze energy-related news headlines and predict positive/negative signals influencing oil-and-gas index values, optimized via cross-entropy training and hyperparameter tuning. This approach provided a deeper understanding of market dynamics driven by sentiment, offering critical insights into market behavior.
Applied multivariate regression to predict future capital expenditures and operating costs of U.S. power plants, accounting for various fuel and technology types. This analysis enabled more accurate financial forecasting, supporting decision-making across different plant configurations and cost structures.
Built a Python application to map U.S. gas power plants to their nearest gas hub district using the K-Means clustering algorithm and Vincenty distance function. This mapping ensured that each power plant was within a 12-mile radius of every other plant in the hub, helping optimize distribution networks and improve logistical efficiency.















