Driving Innovation and Transformation Through Data Science & AI

Passionate Data Analyst Eager to Tackle New Challenges

https://github.com/heesukjang
https://www.linkedin.com/in/heesukjang/

I'm a data enthusiast with over 6 years of experience in data analytics, machine learning, and deep learning, specializing in Natural Language Understanding (NLU). My passion for technology keeps me curious, always eager to learn, and excited about finding innovative solutions.

I take pride in delivering high-quality code and keeping projects on track, all while mastering new tools and tackling complex challenges. I love sharing my knowledge, whether it’s training teammates on the latest tech or contributing to S&P Global’s Women In Technology initiatives. Talking about data science and machine learning energizes me, and I'm deeply passionate about inspiring and motivating other women in tech.

For me, success isn’t just about personal achievements. I find the greatest joy in helping others grow and creating an environment where everyone can thrive.

Machine Learning Algorithms

Strong foundation in regression, classification, clustering, and time series analysis to solve diverse business challenges.

Deep Learning & Neural Networks

Skilled in designing and training CNNs, RNNs, and transformer-based architectures for advanced machine learning projects.

Natural Language Processing (NLP)

Expertise in developing NLP solutions using BERT, BERTweet, and related models for various language understanding tasks.

Predictive Modeling & Analytics

Proven ability to create predictive models for energy market forecasting and anomaly detection, delivering actionable insights.

Data Visualization & Storytelling

Proficient in Tableau, Plotly Dash, Seaborn, and Matplotlib to create clear and impactful visual narratives that support decision-making.

Data Engineering & ETL Pipelines

Extensive experience in building and optimizing ETL pipelines using Databricks, PySpark, and Airflow for efficient data processing.

Big Data Technologies (e.g., Databricks, PySpark)

Hands-on experience with big data tools to handle large-scale datasets for analytics and modeling.

Model Deployment & MLOps

Successfully deployed ML models using AWS tools such as SageMaker and Lambda for robust real-time applications.

Statistical Analysis & Hypothesis Testing

Applied statistical techniques to validate models and ensure data-driven decision-making, including using Bayesian methods.

Cloud Computing Platforms (AWS, GCP)

Skilled in leveraging AWS and GCP for machine learning, data engineering, and model deployment.

Feature Engineering & Data Preprocessing

Expertise in transforming raw data into meaningful features that enhance model performance and accuracy.

Data Quality & Validation

Ensured data accuracy through rigorous validation processes across multiple environments (legacy and Databricks).

  • Vision Transformer (ViT): Applied to computer vision tasks for classifying waste image items as either recyclable or non-recyclable.
  • Natural Language Processing (NLP): Utilized BERT and BERTweet (a pre-trained language model for English tweets) to evaluate essay scoring for English language learners in grades 8-12.
  • Convolutional Neural Networks (CNNs): Developed for image classification and recognition tasks, including:
    • Classifying IDC breast cancer histopathology images as cancerous or non-cancerous.
    • Predicting flight departure delays.
  • Transfer Learning with CNNs: Leveraged pre-trained models such as VGG16, VGG19, ResNet50, ResNet152, DenseNet201, Xception, InceptionV3, EfficientNetB7, and MobileNetV for improved training efficiency and performance mainly for image classification tasks.
  • Long Short-Term Memory (LSTM): Applied to sentiment analysis in the energy industry to forecast the performance of energy-related mutual fund benchmarks.
  • Multi-Layer Perceptrons (MLPs): Applied to various classification and regression problems.
  • XGBoost: Effective for structured data with high interpretability, especially on large datasets.
  • Decision Trees & Random Forests: Used for robust predictive modeling with feature importance analysis.
  • Regression Techniques:
    • Linear & Logistic Regression: Applied to basic predictive tasks.
    • General Linear Model (GLM): Utilized for complex data relationships.
  • Clustering Techniques:
    • K-Means Clustering: Used for unsupervised data segmentation.
  • Time Series Modeling:
    • Seasonal ARIMA, STL Decomposition, Exponential Smoothing, and Prophet: Employed for forecasting and trend analysis on energy commodities.

Data Analyst

September 2021 - Present

► Develop five key accuracy metrics to evaluate the relative performance of 20 mainstream oil and gas price forecasts and forward curves, providing critical insights for 23 proprietary hedge funds. ► Build a Tableau dashboard to visualize price forecasts and forecast accuracy metrics for 20 oil and gas market data sets, streamlining manual reporting processes and saving approximately 5 hours per week. ► Engineer ETL solutions for market- and asset-level gas and oil data, significantly reducing average processing time from 7 days to under 5 minutes, leveraging Databricks, PySpark, Python, SQL, AWS, and Airflow. ► Train and mentor team members in adopting new technologies such as Databricks, PySpark, and AWS S3, which enabled the establishment and management of end-to-end data pipelines, cutting the average processing time for analysis-ready data from 6 weeks to a few days. ► Build a machine learning pipeline to detect anomalies on time-series data across different environments (production vs. development) and database systems (PostgreSQL and SQL Server) using various statistical methods such as hypothesis testing with t-tests and Augmented Dickey-Fuller tests, as well as advanced machine learning techniques such as Random Forest and RNN, improving detection accuracy by 95%. ► Active board member and speaker for various panel discussions in S&P Global’s Women in Technology, contributing to the promotion of diversity and inclusion within the tech community.

About the company

Teaching Assistant

January 23, 2023 - January 27, 2023

► Answered technical questions in Python training for Machine Learning (ML) and NLP for 75+ data practitioners at Center For Disease Control and Prevention (CDC) live stream training. ► Guided trainees in breakouts with code exercises in real time and to proceed ML- and NLP-driven work-related projects.

About the company

Senior Data Researcher

May 2018 - September 2021

► Utilized machine learning and data science techniques to deliver actionable insights in global power generation in 10 geographical regions covering 190+ countries worldwide. ► Built an interactive dashboard using Dash Python to formulate energy market- and asset-level impacts on fossil fuel outlook. ► Developed high-performing ETL pipelines for power and utility data, decreasing average processing time from 10 days to under 10 minutes.

About the company

Research Contractor

May 2019 - July 2019

Collaborated with R&D team to develop RNN-based LSTM while applying denoising tools (i.e. Fast Fourier Transform) to detect anomalies in time series sensor data on athletes’ performance, improved detection performance by 55%.

About the company

Quantitative Methodology Analyst

May 2006 - September 2009

Collaborated with cross-functional teams to develop and manage methodologies and algorithms on the KPIs of mutual fund and fund. Led the quantitative analysis on the US side of individual fund, fund families, and asset class/industry to identify trends and insights for investment-decision making.

About the company

University of California, Berkeley

Master of Information and Data Science - Graduated in May 2024

About the school

University of Colorado, Boulder

Studied Computer Science toward Bachelor of Science

About the school

University of Colorado, Denver

Pursued a Master of Business Administration (MBA) with an emphasis in Finance

About the school
  • Irvine, California, United States

LinkedIn: https://www.linkedin.com/in/heesukjang/ Github: https://github.com/heesukjang