To compute the most relevant environmental scores (like the 2° aligment and the biodiversity impact) of every company, we need a rich database, relying of multiple sources - often unstructured -, an efficient impact computation and frequent deep analyses on the quality of our scoring.
Among the ~35 companies employees, the Data Team (4 people) is part of the IT team (12 people) and is looking an intern to:
- Improve our data extractor pipeline which leverages Large Language Models (LLM) APIs and models (and a bunch of other nice tech bricks) to extract automatically the right data into the right format.
- Search and collect new data sources (GHG emissions from companies, assets, natural protected areas, tons of products produced, consumed, companies facilities, etc.)
- Automatically detect discrepancies between the default model and all refinements brought by our analysts.
- Handle our big company database and its evolution in time.
- Build new products that are made possible with LLM
- Benchmark our scores against academic environmental papers.