About me

My name is Jakub Domerecki and I am a Python Software Engineer.

I have been working with Python for 13 years now. I specialize in Data Engineering with emphasis on PySpark and Microsoft Azure services.

Contact me on LinkedIn

Experience

CSHARK (May 2023 - Present)

Senior Data Engineer (August 2024 - Present)

  • Source integration:
    • Onboard data from various APIs using Spark's Python Data Source API.
    • Create metadata driven solution to incrementally load data from SQL Server to Delta Lake using CDC.
    • Develop both classic and Lakeflow Declarative Pipelines in Databricks.
  • CI/CD and dev environment:
    • Establish complete automated CI pipeline with unit and integration testing, package publishing and deployment using Databricks Asset Bundles in Gitlab CI.
    • Design and implement mock services for Python Data Source API integration testing.
    • Create modern development environment with pre-commit hooks, UV dependency management and Docker containerization.
  • Performance optimization:
    • Optimize Spark Jobs performance for improved cost efficiency.
    • Build reusable Python libraries to be shared across projects.

Data Engineer (May 2023 - August 2024)

  • Creation of models to support transactional processing as well as BI analytics.
  • Data Processing Solutions:
    • Development and integration of custom libraries and Spark Jobs for Synapse.
    • Creation of parametrized data processing pipelines (ADF).
  • Data Testing Strategy: Development and implementation of comprehensive data testing strategies.
  • Automated Testing: Execution of unit, integration, and data validation tests.
  • CI/CD Pipeline Development: Leading GitLab CI/CD pipeline creation for Azure Synapse resource deployment.
  • Team Onboarding: Knowledge transfer and training of new team members.

Capgemini (Oct 2021 - Apr 2023)

Data Engineer

  • Provisioning of Azure Services
  • Create and maintain Azure DevOps Repository and Pipelines
  • Refactor of ETL pipeline into medallion architecture
  • Write PySpark and SparkSQL notebooks in Databricks
  • Spark performance tuning
  • Create and maintain Azure Data Factory Pipelines
  • Conduct technical workshops

Nokia (Jul 2018 - Sep 2021)

Software Integration Engineer

  • Lead HW regression testing project (3-person)
  • Perform fault analysis and RCA preparation
  • Create automated test scripts (Python and Robot Framework)
  • Design and implement data visualization tool (Bokeh and Pandas)

WBP Drosystem (Jun 2014 - Jun 2018)

Senior Designer's Assistant

  • Road drainage dimensioning in Python
  • Data scraping in Python and Excel scripting for mail automation