Stagiaire Data Scientist/Data Engineer - Paris

💡 Partenaire de la transition
Stage
Localisation Paris, France
Télétravail ponctuel autorisé
Publiée le 29/07/2024

Iceberg Data lab

Iceberg Data Lab provides environmental data solutions to financial institutions, mostly by computing climate and biodiversity impacts of companies.

💡 Partenaire de la transition

La mission de cette structure est d’aider les entreprises ou les citoyens à améliorer leur impact environnemental et social. Par exemple le conseil en RSE, la formation, la sensibilisation aux enjeux de la transition, les médias,…

Plus d'informations
Mesure d'impact
Iceberg Data lab n'a pas encore transmis de mesure d'impact
Labels et certifications
Cette structure n'a pas souhaité nous communiquer les labels ou certifications qu'elle a pu obtenir.
Voir plus

To compute the most relevant environmental scores (like the 2° aligment and the biodiversity impact) of every company, we need a rich database, relying of multiple sources - often unstructured -, an efficient impact computation and frequent deep analyses on the quality of our scoring.

Among the ~35 companies employees, the Data Team (4 people) is part of the IT team (12 people) and is looking an intern to:

  • Improve our data extractor pipeline which leverages Large Language Models (LLM) APIs and models (and a bunch of other nice tech bricks) to extract automatically the right data into the right format.
  • Search and collect new data sources (GHG emissions from companies, assets, natural protected areas, tons of products produced, consumed, companies facilities, etc.)
  • Automatically detect discrepancies between the default model and all refinements brought by our analysts.
  • Handle our big company database and its evolution in time.
  • Build new products that are made possible with LLM
  • Benchmark our scores against academic environmental papers.
Profil recherché

Minimum: Python, Data Science, data analysis, data engineering, Pandas, SQL, Git, interest in environmental issues.

Optional: Machine Learning, Docker, Langchain, Ollama, Airflow, FastAPI, ElasticSearch, Streamlit, Spark, Kubertnetes, Celery, Ruby, Financial or environmental knowledge.