Data Scientist/Data Engineer Intern - Paris

💡 Transition partners
Internship
Localisation Paris, France
Occasional remote authorized
Posted on 07-29-2024

Iceberg Data lab

Iceberg Data Lab provides environmental data solutions to financial institutions, mostly by computing climate and biodiversity impacts of companies.

💡 Transition partners

The mission of this structure is to help companies and citizens improve their environmental and social impact. For example, CSR consulting, training, raising awareness of transition issues, media, etc.

More information
  • Website
  • Unknown
  • Between 15 and 50 persons
  • Others
Impact study
Iceberg Data lab did not yet communicate its impact measurement.
Labels and certifications
This structure did not communicate to us the labels or certifications that it was able to obtain.
Read more

To compute the most relevant environmental scores (like the 2° aligment and the biodiversity impact) of every company, we need a rich database, relying of multiple sources - often unstructured -, an efficient impact computation and frequent deep analyses on the quality of our scoring.

Among the ~35 companies employees, the Data Team (4 people) is part of the IT team (12 people) and is looking an intern to:

  • Improve our data extractor pipeline which leverages Large Language Models (LLM) APIs and models (and a bunch of other nice tech bricks) to extract automatically the right data into the right format.
  • Search and collect new data sources (GHG emissions from companies, assets, natural protected areas, tons of products produced, consumed, companies facilities, etc.)
  • Automatically detect discrepancies between the default model and all refinements brought by our analysts.
  • Handle our big company database and its evolution in time.
  • Build new products that are made possible with LLM
  • Benchmark our scores against academic environmental papers.
Profile

Minimum: Python, Data Science, data analysis, data engineering, Pandas, SQL, Git, interest in environmental issues.

Optional: Machine Learning, Docker, Langchain, Ollama, Airflow, FastAPI, ElasticSearch, Streamlit, Spark, Kubertnetes, Celery, Ruby, Financial or environmental knowledge.