Skip to content
← All ProductsData Quality

HoneyGuide

A multi-tenant SaaS survey pipeline platform for automated data collection, validation, facial recognition duplicate detection, and real-time field team analytics.

Overview

HoneyGuide is 2M Corp's proprietary survey pipeline and data quality platform — a comprehensive Django-based multi-tenant SaaS system designed for subscription-based data collection, automated validation, anonymised exports, and advanced analytics. Named after the honeyguide bird, known for guiding hunters to honey, the platform guides field teams and supervisors to quality issues in real time.

The platform integrates with two major CAPI data collection tools — ODK Central and KoBoToolbox — pulling survey submissions into automated processing pipelines that validate, clean, and transform data. Its facial recognition engine, powered by InsightFace AI and pgvector similarity search, screens photographs against the entire database to detect duplicate respondents and ghost records.

Capabilities

Data Collection & Integration

Multi-Platform Integration

Connects to ODK Central, KoBoToolbox, and SurveyCTO — pulling survey submissions, form definitions, and metadata into a unified pipeline.

Automated Data Processing

Validation, cleaning, and transformation pipelines powered by Great Expectations. Select-multiple fields are automatically expanded into binary columns with intelligent ordering.

Multi-Format Export

Export processed datasets in CSV, Excel, SPSS, and Stata formats with post-serialization anonymisation for client-facing deliverables.

Quality Assurance

Automated Validation Suites

Configurable validation rules check incoming data for completeness, range violations, skip logic errors, and cross-field inconsistencies. Results are tracked with detailed issue reporting.

Data Cleaning Engine

User-friendly cleaning interface with dynamic parameter fields — no JSON required. Supports clamping, recoding, regex replacement, date parsing, whitespace trimming, and manual corrections.

Back-Check Workflows

Dataset comparison tools that cross-reference original submissions with back-check interviews to verify data integrity and flag discrepancies.

Face Verification & Biometrics

Duplicate Detection

InsightFace AI-powered facial recognition screens every photograph against the entire database using pgvector similarity search, identifying duplicate respondents and preventing fraud.

Analytics & Dashboards

Real-Time Field Monitoring

Track enumerator performance in real time: submission rates, interview duration, GPS coverage, rejection rates, and quality scores.

Advanced Calculation Engine

Statistical analysis powered by NumPy and Pandas — trend detection, performance insights, team scoring, and geographic analysis with multi-layer caching for optimal performance.

Client Portal

Anonymised data exports and analytics dashboards for external stakeholders, with role-based access control and project-level data isolation.

Platform Architecture

Multi-Tenant SaaS

Organisation-based isolation with subscription management. Tier-based plans (Free, Starter, Professional, Enterprise) with usage tracking and configurable limits.

Azure Cloud Integration

Azure Blob Storage for media files, Key Vault for secrets management, Monitor and Application Insights for observability, and Communication Services for transactional emails.

Technology Stack

Django 5+Django REST FrameworkPostgreSQL 15RedisCeleryReactAzure Blob StorageAzure Key VaultAzure MonitorInsightFace AIpgvectorGreat ExpectationsNumPy / PandasDockerODK Central APIKoBoToolbox API