We are looking for a Lead Data Engineer to take charge of designing and managing our data infrastructure. You will lead efforts in developing scalable and high-performance data models. You’ll oversee our ETL pipelines, data ingestion processes, and collaborate closely with data scientists to ensure their machine learning models are smoothly integrated into production. You will also play a key role in defining the infrastructure necessary for heterogeneous data ingestion, ML training processes and ML Ops, ensuring the right pipelines, monitoring, and automation are in place.
Key Responsibilities:
- Lead the design and optimization of data models and infrastructure to support large-scale data processing.
- Oversee and manage the data layer architecture, currently built on Cube.dev and MongoDB, with a key objective to evaluate and potentially transition to an SQL-based system (e.g., PostgreSQL) for enhanced performance.
- Handle geospatial data management, ensuring efficient handling of location-based data for analysis, storage, and visualization.
- Build and maintain robust ETL pipelines and data ingestion streams that ensure high availability, reliability, and performance of data systems.
- Collaborate with the data science team to ensure the integration of machine learning models into production environments, focusing on efficient model deployment, monitoring, and iteration.
- Design and implement ML Ops infrastructure to support model training, experimentation, and deployment, including tracking, versioning, and scalability of training processes.
- Define and implement best practices for data governance, ensuring security, quality, and compliance.
- Evaluate and adopt new tools and technologies to improve data processing, with a focus on real-time data ingestion and scalable ML infrastructure.
- Provide leadership in shaping the future of our data architecture, ensuring it aligns with the company’s goals of sustainability and high-impact analytics.