https://github.com/heesukjang
https://www.linkedin.com/in/heesukjang/
Strong foundation in regression, classification, clustering, and time series analysis to solve diverse business challenges.
Skilled in designing and training CNNs, RNNs, and transformer-based architectures for advanced machine learning projects.
Expertise in developing NLP solutions using BERT, BERTweet, and related models for various language understanding tasks.
Proven ability to create predictive models for energy market forecasting and anomaly detection, delivering actionable insights.
Proficient in Tableau, Plotly Dash, Seaborn, and Matplotlib to create clear and impactful visual narratives that support decision-making.
Extensive experience in building and optimizing ETL pipelines using Databricks, PySpark, and Airflow for efficient data processing.
Hands-on experience with big data tools to handle large-scale datasets for analytics and modeling.
Successfully deployed ML models using AWS tools such as SageMaker and Lambda for robust real-time applications.
Applied statistical techniques to validate models and ensure data-driven decision-making, including using Bayesian methods.
Skilled in leveraging AWS and GCP for machine learning, data engineering, and model deployment.
Expertise in transforming raw data into meaningful features that enhance model performance and accuracy.
Ensured data accuracy through rigorous validation processes across multiple environments (legacy and Databricks).
September 2021 - Present
► Develop five key accuracy metrics to evaluate the relative performance of 20 mainstream oil and gas price forecasts and forward curves, providing critical insights for 23 proprietary hedge funds. ► Build a Tableau dashboard to visualize price forecasts and forecast accuracy metrics for 20 oil and gas market data sets, streamlining manual reporting processes and saving approximately 5 hours per week. ► Engineer ETL solutions for market- and asset-level gas and oil data, significantly reducing average processing time from 7 days to under 5 minutes, leveraging Databricks, PySpark, Python, SQL, AWS, and Airflow. ► Train and mentor team members in adopting new technologies such as Databricks, PySpark, and AWS S3, which enabled the establishment and management of end-to-end data pipelines, cutting the average processing time for analysis-ready data from 6 weeks to a few days. ► Build a machine learning pipeline to detect anomalies on time-series data across different environments (production vs. development) and database systems (PostgreSQL and SQL Server) using various statistical methods such as hypothesis testing with t-tests and Augmented Dickey-Fuller tests, as well as advanced machine learning techniques such as Random Forest and RNN, improving detection accuracy by 95%. ► Active board member and speaker for various panel discussions in S&P Global’s Women in Technology, contributing to the promotion of diversity and inclusion within the tech community.
About the companyJanuary 23, 2023 - January 27, 2023
► Answered technical questions in Python training for Machine Learning (ML) and NLP for 75+ data practitioners at Center For Disease Control and Prevention (CDC) live stream training. ► Guided trainees in breakouts with code exercises in real time and to proceed ML- and NLP-driven work-related projects.
About the companyMay 2018 - September 2021
► Utilized machine learning and data science techniques to deliver actionable insights in global power generation in 10 geographical regions covering 190+ countries worldwide. ► Built an interactive dashboard using Dash Python to formulate energy market- and asset-level impacts on fossil fuel outlook. ► Developed high-performing ETL pipelines for power and utility data, decreasing average processing time from 10 days to under 10 minutes.
About the companyMay 2019 - July 2019
Collaborated with R&D team to develop RNN-based LSTM while applying denoising tools (i.e. Fast Fourier Transform) to detect anomalies in time series sensor data on athletes’ performance, improved detection performance by 55%.
About the companyMay 2006 - September 2009
Collaborated with cross-functional teams to develop and manage methodologies and algorithms on the KPIs of mutual fund and fund. Led the quantitative analysis on the US side of individual fund, fund families, and asset class/industry to identify trends and insights for investment-decision making.
About the companyMaster of Information and Data Science - Graduated in May 2024
About the schoolPursued a Master of Business Administration (MBA) with an emphasis in Finance
About the school