Terms of Reference (ToR) for Data Science Expert - AI Bias Reduction & Inclusive Data Generation Engine
Background
AfriLabs
AfriLabs is a network organization that supports innovation hubs, tech communities, and entrepreneurship in Africa. With a presence across 53 countries, AfriLabs plays a key role in fostering the growth of innovative tech solutions that address critical challenges in Africa. By connecting innovators, providing resources, and promoting collaboration, AfriLabs drives entrepreneurship and technological advancements across the continent. The organisation has been a strategic partner in various initiatives aimed at empowering African innovators, including organizing hackathons, incubation programs, and capacity-building initiatives.
Gates Foundation and Meta
The Gates Foundation, through its Gender Equality Digital Connectivity (GEDC) and Digital Public Infrastructure (DPI) teams, is committed to fostering digital inclusion and gender equality across Africa. The GEDC team focuses on developing digital tools and technologies that are sensitive to gender issues, ensuring equitable access to information and services for women.
Meta (formerly Facebook) is a global technology leader, known for its work in advancing artificial intelligence (AI), machine learning, and open-source tools. Meta supports the Llama Impact Grants Program, which seeks to empower innovators and startups to leverage open-source AI models like Llama to address societal challenges, including gender sensitivity and inclusivity, and linguistic diversity.
Together, the Gates Foundation and Meta have partnered to launch a series of programs aimed at improving AI outputs in African languages and ensuring that these solutions are gender-sensitive, linguistically diverse, and culturally relevant.
About the Project
The AI Bias Reduction & Inclusive Data Generation Engine programme is a key initiative designed to engage innovators across Africa in the development of AI-driven solutions that address gender sensitivity, particularly in African languages. The goal is to create innovative AI models that are inclusive and free from gender bias, promoting equitable access to information for both women and men in African communities.
In Phase 1, there was a hackathon where AfriLabs engaged 99 participants, forming 19 country-based teams that developed prototypes over the course of an intensive competition. The top eight winning teams proceeded to apply for the Llama Impact Grants, a fund that supports innovative applications of their open-source AI model, Llama, to address pressing societal challenges.
These teams have now advanced to Phase 2, where four for the eight winning teams have been selected to receive mentor support, to develop a gender sanitization engine that will be analysing and cleaning outputs within 17 pre-determined African languages. Consultancy is a critical component of this initiative, as it provides participants with the guidance needed to refine their work when it comes to the creation of the gender sanitization engine.
Project Objectives
The AI BRIDGE (AI Bias Reduction & Inclusive Data Generation Engine) initiative aims to advance the development of gender-sensitive, inclusive AI tools by engaging African innovators in dataset-level bias mitigation. The primary objectives of Phase 2 of this initiative are to:
- Detect and mitigate gender bias within datasets used for training AI/LLM systems.
- Support dataset curators and AI developers with tools to rewrite biased content while maintaining meaning.
- Promote inclusive AI development across African and global linguistic contexts.
- Enable scalable, modular integration across any AI model pipeline.
This phase builds on earlier outputs from the efforts within the phase one activity; the hackathon and will empower selected teams to build model-agnostic, upstream bias detection and rewriting pipelines that enhance fairness and representation in AI systems within the African context.
Project Methodology
The implementation of AI BRIDGE Phase 2 will follow a structured, modular, and academically rigorous approach. It will blend automated processes with human oversight to ensure gender sensitivity and cultural accuracy in AI training datasets. The methodology is divided into seven critical stages:
- Data Ingestion
- Source/Curate diverse structured and unstructured datasets across targeted African languages and sectors.
- Align data sources with use cases relevant to gender representation and inclusion.
- Bias Detection
- Apply hybrid scanning techniques (rule-based logic, machine learning classifiers, and statistical metrics) to identify gender-biased, stereotypical, or exclusionary content.
- Highlight problematic language using linguistic and contextual tagging.
- Content Analysis
- Conduct contextual tagging of flagged entries, including metadata classification (e.g., severity, stereotype type, grammatical role).
- Organize entries into review-ready clusters for annotation and sanitization.
- Content Sanitization
- Deploy transformer-based models to suggest rewritten alternatives that are semantically consistent yet bias-free.
- Incorporate a controlled vocabulary for gender-sensitive rewriting.
- Human-in-the-Loop Validation
- Facilitate expert review from gender specialists, linguists, and cultural advisors to validate rewrites in context-sensitive cases.
- Enable override or acceptance workflows using customized annotation tools.
- Dataset Finalization
- Finalize sanitized datasets, version-control all changes, and document transformation history for audit and transparency.
- Deliver outputs with full traceability and aligned with ethical AI documentation standards.
- Feedback and Continuous Improvement
- Maintain a feedback loop for rejected or complex cases to improve rule sets and classifier accuracy.
- Integrate feedback into successive model training iterations and guideline updates.
Purpose of the Assignment
AfriLabs is seeking a data science expert to support the development of AI-driven gender-sensitive tools for African languages. The expert will provide technical guidance, assess linguistic and gender biases in model outputs, and support the integration of inclusive practices within the AI systems being developed. This role requires a strong understanding of AI technologies, the African innovation landscape, and familiarity with the nuances of African Sociolinguistic contexts.
Scope of Work
The data science expert will be responsible for the following:
- Technical Advisory and Methodological Guidance:
- Provide strategic guidance on the design of model-agnostic bias detection and dataset sanitization architecture, advising teams on proven NLP and ML approaches to mitigate gender bias.
- Offer expert input on data pipeline architecture, ensuring modularity, auditability, and scalability across multilingual datasets and multiple use cases.
- Lead knowledge transfer sessions with participating teams to align them on best practices for ethical, fair, and inclusive AI dataset development.
- Advisory on Bias Detection Frameworks:
- Recommend appropriate rule-based and machine learning techniques to be used for gender bias identification in textual datasets.
- Guide teams in choosing or fine-tuning bias detection models, drawing from tools or custom classification algorithms.
- Review teams’ detection model prototypes and provide critical feedback on bias identification accuracy, false positives/negatives, and ethical implications.
- NLP Methodology & Dataset Structuring Support:
- Support teams in developing annotated datasets by advising on tagging schemas that capture types of bias, language-specific patterns, and cultural context.
- Help select and refine tokenization strategies, language models (e.g., spaCy, Hugging Face Transformers), and embedding techniques that are effective for African languages and low-resource environments.
- Review and advise on the use of contextual metadata in dataset tagging workflows.
- Guidance on Rewriting Models and Approaches
- Advise on methodologies for context-aware rewriting of biased or exclusionary language using transformer-based models or controlled vocabularies.
- Provide strategic input on how to balance automated rewriting and human-in-the-loop review, and help define criteria for what types of rewrites require expert validation.
- Review rewrite modules or logic developed by the teams and suggest improvements for semantic preservation and cultural relevance.
- Human-in-the-Loop & Ethical Oversight Mechanism Design
- Co-design human-in-the-loop validation frameworks that enable gender experts and linguists to review, approve, or override automated outputs.
- Help set up data governance protocols to ensure transparency, auditability, and reproducibility across validation stages.
- Train or orient teams on how to interpret reviewer feedback and incorporate it into model or rule refinement.
- Monitoring, Evaluation and Continuous Learning Design
- Define technical performance metrics (e.g., Detection Precision, Rewrite Acceptance Rate, Bias Reduction Score).
- Advise on methods to create feedback loops that continuously improve model and rule performance based on real-world data and validation outcomes.
- Support the development of dashboards or review logs for progress tracking and decision-making.
- Support on API & Tooling Strategy
- Recommend API structures and data formats that ensure ease of integration across different data processing pipelines.
- Guide teams on best practices for modular design, ensuring the system remains adaptable to various AI architectures.
- Review documentation and give input on how to expose bias mitigation functions as RESTful API endpoints for downstream use.
Required Qualifications and Experience
- General Qualifications
The expert is expected to possess the following qualifications:
- A Master’s degree or equivalent experience in data science, statistics, or a related quantitative field.
- Demonstrated experience in NLP, bias detection, or ethical ML applications, especially within multilingual or culturally diverse contexts.
- Specialized Qualifications and Experience
All Consultants should be qualified in two or more of the following areas:
- Proven experience in applying statistical techniques to measure and interpret bias within datasets, including gender-based or socio-linguistic bias.
- Strong proficiency in using data analysis and scientific computing libraries (e.g., pandas, NumPy, SciPy, matplotlib) within Python-based environments.
- Familiarity with dataset versioning and collaborative workflow tools such as Git, DVC, and Jupyter Notebooks.
- Previous involvement in projects that applied ethical or inclusive data science principles is highly desirable.
Proposed/Tentative timeline of deliverables
- Selected Consultants are expected to carry out this assignment for a total of five months (from 7th August 2025 to the 31st of December 2025).
- This will be done remotely
Method Of Application
Applying as an Individual:
- Filled Vendor request form
- Send a Cover Letter, CV or portfolio stating profile and relevant experience.
- Means of Identification.
- Financial proposal (This must be in USD)
- Submission of organization reference (i.e Name, email and phone number of organization you have provided similar services)
- A sample of previous review work, preferably related to competitive innovation or grant programs.
Applying as a company:
- Filled Vendor Request Form
- Send your company’s certificate.
- Company registration document.
- Cover letter and CV of relevant experience of the applicant or Team Lead.
- Means of Identification of the team lead.
- Financial proposal (This must be in USD)
- Submission of organization reference (i.e. Name, email and phone number of organization you have provided similar services) A sample of previous review work, preferably related to competitive innovation or grant programs.
Evaluation Criteria:
Evaluation of each applicant will be on the basis of the following :
Technical Proposal (65%)
- Submission of the requested documents as outlined in the method of application for each category
- Proof of required qualification and expertise
- Proof of work done in relation to the requested scope
Financial Proposal (35%)
- Financial proposal should be submitted in USD and kindly indicate if the amount proposed is negotiable or not.
- Validity of the financial proposal should be minimum of 6 months.
Note the following:
- Please download and use the attached Vendor information form attached below to submit your application to procurement@afrilabs.com with the email subject: “AI BRIDGE DATA SCIENCE EXPERT”. Applicants should submit their applications on or before COB August 15th, 2025.
- Kindly find the Vendor information form. Please download before use, DO NOT edit the uploaded template.
- Paid Hub members are strongly advised to apply with their company name (If within their thematic areas and expertise).
- Women are strongly encouraged to apply
- Only the shortlisted applicants will proceed to the next stage
- Renumeration Range: $3,500 – $5,000