HoneyGuide
A multi-tenant SaaS survey pipeline platform for automated data collection, validation, facial recognition duplicate detection, and real-time field team analytics.
Overview
HoneyGuide is 2M Corp's proprietary survey pipeline and data quality platform — a comprehensive Django-based multi-tenant SaaS system designed for subscription-based data collection, automated validation, anonymised exports, and advanced analytics. Named after the honeyguide bird, known for guiding hunters to honey, the platform guides field teams and supervisors to quality issues in real time.
The platform integrates with two major CAPI data collection tools — ODK Central and KoBoToolbox — pulling survey submissions into automated processing pipelines that validate, clean, and transform data. Its facial recognition engine, powered by InsightFace AI and pgvector similarity search, screens photographs against the entire database to detect duplicate respondents and ghost records.
Capabilities
Data Collection & Integration
Multi-Platform Integration
Connects to ODK Central, KoBoToolbox, and SurveyCTO — pulling survey submissions, form definitions, and metadata into a unified pipeline.
Automated Data Processing
Validation, cleaning, and transformation pipelines powered by Great Expectations. Select-multiple fields are automatically expanded into binary columns with intelligent ordering.
Multi-Format Export
Export processed datasets in CSV, Excel, SPSS, and Stata formats with post-serialization anonymisation for client-facing deliverables.
Quality Assurance
Automated Validation Suites
Configurable validation rules check incoming data for completeness, range violations, skip logic errors, and cross-field inconsistencies. Results are tracked with detailed issue reporting.
Data Cleaning Engine
User-friendly cleaning interface with dynamic parameter fields — no JSON required. Supports clamping, recoding, regex replacement, date parsing, whitespace trimming, and manual corrections.
Back-Check Workflows
Dataset comparison tools that cross-reference original submissions with back-check interviews to verify data integrity and flag discrepancies.
Face Verification & Biometrics
Duplicate Detection
InsightFace AI-powered facial recognition screens every photograph against the entire database using pgvector similarity search, identifying duplicate respondents and preventing fraud.
Analytics & Dashboards
Real-Time Field Monitoring
Track enumerator performance in real time: submission rates, interview duration, GPS coverage, rejection rates, and quality scores.
Advanced Calculation Engine
Statistical analysis powered by NumPy and Pandas — trend detection, performance insights, team scoring, and geographic analysis with multi-layer caching for optimal performance.
Client Portal
Anonymised data exports and analytics dashboards for external stakeholders, with role-based access control and project-level data isolation.
Platform Architecture
Multi-Tenant SaaS
Organisation-based isolation with subscription management. Tier-based plans (Free, Starter, Professional, Enterprise) with usage tracking and configurable limits.
Azure Cloud Integration
Azure Blob Storage for media files, Key Vault for secrets management, Monitor and Application Insights for observability, and Communication Services for transactional emails.
Deployments
PAMP Civil Service Verification — The Gambia
Ministry of Public Service / World Bank
Facial recognition-based screening of approximately 50,000 civil servants and 8,000 pensioners, with every photograph checked against the entire database. HoneyGuide served as the quality assurance backbone, providing real-time validation and biometric duplicate detection across the field teams.
View Project →MEISS Midline Education Evaluation — The Gambia
Ministry of Basic & Secondary Education / World Bank
Real-time quality assurance for EGRA/EGMA assessments across approximately 120 schools and 8,000+ students. HoneyGuide monitored enumerator performance, data consistency, and flagged anomalies for supervisor review.
View Project →KMC Open Location Codes Research — The Gambia
University of Essex / Kanifing Municipal Council
Quality monitoring for approximately 4,000 household surveys in the Kanifing Municipality, ensuring data integrity for the randomised controlled trial evaluation of open location code adoption.
View Project →