Data Science Engineer
DeFacto
Sep 2021 - Current- Enhanced a prediction model using Linear Regression to estimate stock values with region sales history for new stores, and improved the model success by XGBoost.
- Created new 375 clothing combinations to increase sales 15% by analyzing data and creating K-Means Clustering model on Google Cloud Platform and Python.
- Developed an ETL pipeline to extract data from HDFS/Local DB/GCP and clean, merge data, segment costumer to 120 types and load to Elastic Search Cluster and Kafka CRM integration system using PySpark in JupyterHub.