E T Consultant
Job #: req22840
Sector: Information Technology
Term Duration: 1 year 0 months
Recruitment Type: Local Recruitment
Location: Washington, DC,United States
Required Language(s): English
Preferred Language(s): Closing Date: 6/5/2023 (MM/DD/YYYY) at 11:59pm UTC
IFC—a member of the World Bank Group—is the largest global development institution focused on the private sector in emerging markets. We work in more than 100 countries, using our capital, expertise, and influence to create markets and opportunities in developing countries. In fiscal year 2022, IFC committed a record $32.8 billion to private companies and financial institutions in developing countries, leveraging the power of the private sector to end extreme poverty and boost shared prosperity as economies grapple with the impacts of global compounding crises. For more information, visit www.ifc.org.
The ESG Sustainability Advice and Solutions Department (CEG) is IFC’s center of excellence in Environment, Social, and Governance (ESG) and offers a range of expertise to help IFC’s Investment and Advisory clients identify and solve complex ESG risk-related challenges, and to find value-added opportunities in their business operations. More details can be found at www.ifc.org/sustainability
CEG has a long history of innovation centered on integrating ESG considerations into emerging markets. A dedicated Innovation and Data Science function within the Department has been mandated to serve as a learning, technology exploration, experimentation, and knowledge hub to explore, test, and understand emerging technologies, using design thinking and lean-agile principles to enable IFC to fully harness the Digital Age and achieve its mission.
CEG is building an Artificial Intelligence-powered platform known as Machine Learning ESG Analyst (MALENA) to create ESG risk and impact assessment capacity at scale. More details can be found on the MALENA Page.
The purpose of this Terms of Reference is to hire an extended-term consultant (ETC) to support the MALENA team and the CEG Data Science work program in building version 2.0 based Machine Learning/Deep Learning models, emergent use cases such as Generative AI and creating real-time inference pipelines for beta testers.
Roles and Responsibilities:
- Report to CEG Innovation Lead and work under the supervision of the CEG Lead Data Scientist.
- Work with ESG stakeholders to understand the business problem and connect those problems with solvable Data Science solutions.
- Audit the different text data assets of the Department and determine how to analyze these data assets for insights.
- Clean and prepare text data to enable Natural Language understanding.
- Prepare high-quality training data with appropriate coverage of the ESG business domain.
- Apply different natural language processing (NLP) techniques to analyze the sentiment within the purview of IFC’s ESG business domain.
- Build ingestion processes to prepare, extract, and annotate a variety of unstructured data sources (social media, news, internal/external documents, images, video, voice, emails, financial data, and operational data).
- Build data automation and integration solutions to ease business problem-solving and enable data sources connections and data quality assurance.
- Leverage a variety of tools and approaches to solve complex business objectives, from statistical NLP, information retrieval/extraction, Machine Learning/Deep Learning, Large Language Modelling, Machine Translation, and semantic search.
- Optimize and automate ways to label unstructured data from various data sources.
- Experiment with multiple machine learning and large language models (LLMs) and choose the optimal model for training or fine-tuning.
- Follow industry trends in the data science and the AI domain and execute proof of concepts with advanced techniques.
- Have familiarity with working using a lean agile team framework with teams that deliver business value incrementally.
- Advanced degree in Computer Science, Data Science, Data Engineering, or a bachelor’s degree with a minimum of 3 years prior experience working in teams of AI, data & analytics professionals to deliver on business-driven analytics projects using natural language processing and machine learning on unstructured data.
- Understand data preparation, machine learning, deep learning, natural language processing, and the ability to discuss mathematical formulations, alternatives, and impact on modeling approaches.
- In-depth understanding of Text analytics & Natural Language Processing concepts such as Lemmatization, Word segmentation, and Part-of-speech. Tagging, Stemming, Named-Entity Recognition, word2Vec, and Doc2Vec.
- Experience in fine-tuning pre-trained large language models, such as GPT-3, BERT, Facebook’s Llama, Databricks’ Dolly, on a specific task or dataset to improve performance.
- Knowledge in creating a new language model from scratch, including designing a new architecture, collecting, and cleaning training data, and training the model.
- Ability to develop applications that use large language models, such as chatbots, recommendation systems, or automated summarization tools.
- Ability to perform prompt engineering to improve large model outcomes.
- Experience in building multimodal models, e.g., involves integrating and analyzing data from multiple sources or modalities, such as text, images, audio, video, and other types of structured and unstructured data,
- Ability to quickly use and implement the latest NLP research and approaches.
- Advanced expertise in Python (PySpark) and specialized machine learning libraries/packages for implementing machine learning models.
- Proficient in Python (PySpark) data analysis and ML libraries like Panda, NumPy, and Scikit-learn.
- At least three years of experience using one or more of the following deep learning frameworks: TensorFlow, Keras, MLFlow, Pytorch, etc.
- Work experience in one or more of the standard NLP models and tools like Google BERT, RoBERTa, SpaCy, NLTK, Stanford Core NLP, and Lang chain.
- Proficient with parallel processing APIs such as Apache Spark and PySpark
- Proficient in Machine translation and Optical Character Recognition (OCR) for complex documents processing (PDF, Word Documents, Scanned Documents.
- Work experience in building Jupyter notebooks for conducting data science operations.
- Experience performing data science tasks (data discovery, cleaning, model selection, validation, and deployment).
- Coding artificial intelligence methods and restructuring, refactoring, and optimizing code for efficiency.
- Understanding of development practices such as testing, code design, complexity, and code optimization.
- Experience with Cloud platforms (Azure Databricks and AWS)
- Solid problem-solving skills and effective verbal/written communication.
- Ability to work in multi-disciplinary teams.
- Understanding non-financial risks, including environmental, social, and governance risks a plus.
- Possessing strong expertise in MLOps and Model serving at scale and innovative ways of deploying ML models such as plugins and add-ons, SharePoint add-ons, mobile apps, desktop apps, APIs, etc.
- Proficient with taxonomy management Platforms, PoolParty or FIBO
World Bank Group Core Competencies
We are proud to be an equal opportunity and inclusive employer with a dedicated and committed workforce, and do not discriminate based on gender, gender identity, religion, race, ethnicity, sexual orientation, or disability.
Learn more about working at the World Bank and IFC, including our values and inspiring stories.
Note: The selected candidate will be offered a one-year appointment, renewable for an additional one year, at the discretion of the World Bank Group, and subject to a lifetime maximum ET Appointment of two years. If an ET appointment ends before a full year, it is considered as a full year toward the lifetime maximum. Former and current ET staff who have completed all or any portion of their second-year ET appointment are not eligible for future ET appointments.