Social Registry

Social Registry (SR) is an independent module offered by OpenG2P to enable creating registries of individuals and groups of people with demographic data with advanced features that makes the SR interoperable and easily fit into the digital public infrastructure (DPI) infrastructure of a country.

Some of the key benefits of using a SR are:

  • Issue Verifiable Credentials (VCs) to registrants

  • Share data with other departments and organizations in a standardized manner thus avoiding multiple collection of data

  • Provide control to individual persons of their data empowering them

  • Attestation

  • Visualization and analysis of social data

Registry update mechanisms

  • Individual login and update

  • Admin login and update

  • Bulk update using CSV

  • Import of data from others sources

  • Offline update using ODK Central

Functionality and features

  • Registry of human demographic data

  • Should be a trusted source of truth

    • Attestation

    • Verifiable Credentials: should be able to issue VC

  • Person should have control over his/her data – person should be able to update the data (self service)

  • Relationships between people

  • Groups and Households

  • Privacy & security of data (using MOSIP encryption modules)

  • Should be possible to share this data with others (DPI)

    • Compliant to standards like DCI/G2P Connect/GovStack

  • Should be possible to add fields in the registry

  • Timestamped data

  • Change log

  • Multiple versions of person record

  • Reporting (Statistics)

Design

Change log

Change log can be build using the Odoo OCA package Audit Log. This would be set of changes in any field(s) for the registry.

Multiple versions of data

Multiple version of a person's record might come in the following scenarios

  • Change in field value

  • New updated record coming in for the person which would be termed as a named version (typically surveys)

  • Feedback call from the connected applications

Proposed solution: Utilizing Elasticsearch as the backbone, we aim to implement a robust solution for managing multiple versions efficiently. Below are the key technology components and strategies discussed:

  • Debezium configuration: Debezium will be configured to capture real-time changes from the database's Write-Ahead Logs (WALs), ensuring that any modifications or additions to the data are promptly recorded.

  • Elasticsearch setup: Elasticsearch will serve as the primary destination for streaming the captured changes. Leveraging its indexing capabilities, Elasticsearch will efficiently organize and store the data, facilitating quick retrieval and analysis.

  • Indexing strategy: An indexing strategy will be devised to optimize the storage and retrieval of captured data. This strategy will accommodate multiple versions of records, ensuring that historical data remains accessible and searchable.

  • Authorization implementation: Authorization mechanisms will be implemented within Elasticsearch. This will control access to the API endpoints, ensuring that only authorized users can interact with the data.

  • API Endpoints configuration: API endpoints will be configured in Elasticsearch to expose the captured data to authorized users. These endpoints will provide seamless access to multiple versions of records, enabling users to retrieve and analyze data as needed. Client should be in a position to fetch based on named version or any other parameter like timeframe.

Relationships between people

Relationship functionality would primarily address the family relationship.

Parent info for a registrant

  • Parent1Id

  • Parent2Id

  • IsAdopted

  • GaurdianId

Spouse Relationship (new table)

  • Person1Id

  • Person2Id

  • IsMarried

  • MarriedOn

  • IsSeperated

  • SeperatedOn

Current thought process is to set all possible relations with the above data structure.

Groups and Households

Attestation

Attestation table

  • id of the field

  • status (NEW, ATTESTED, REJECTED ..)

  • attested by

  • attestation datetime

  • comments

The status fields will come from business processes and real use cases.

User interface

UI required for the following:

  • Person to log in, view and update records

  • Admin to view and attest fields with comments

  • Download of CSV for chosen fields of registry

  • Upload of attested CSV

Bulk attestation

We should be able to download a CSV from the registry, apply bulk attestation, and upload back the CSV. The upload should trigger an update of registry, change log and attestation table

  • Attestation table

    • id of the field

    • status (NEW, ATTESTED, REJECTED ..)

    • attested by

    • attestation datetime

    • comments

Search (WIP)

SR may contain several million records (say, 20 million) capturing demographic fields of individuals. The number of demographic fields may be large, say 60-70 columns in the database. We need a quicker search based on various columns. The following methods are proposed for faster search of large data in the social registry.

  1. Indexing: Index columns based on the most frequent queries.

  2. Opensearch: Use Reporting infrastructure to make all the data available in OpenSearch.

  3. Citus: Consider splitting DB into multiple databases using Citus. This is an infrastructure layer.

Indexing should be the first approach, but if the number of columns are large, indexing with bloat the DB. In addition to indexing, we must make data available in OpenSearch. The data shunted into OpenSearch would not contain PII data. Any query on SR would first be routed to OpenSearch, and a list of IDs obtained. Then given a list of IDs, further query on the DB may be performed to fetch the results. The OpenSearch framework will also help us generate reports and real time stats. One issue with this approach is to make sure data is not missing in OpenSearch. Some reconciliation will have to be done periodically to ensure that all IDs present in DB are available in OpenSearch.

API

The SR exposes the following REST APIs:

TBD

Last updated

Logo

Copyright © 2024 OpenG2P. This work is licensed under Creative Commons Attribution International LicenseCC-BY-4.0 unless otherwise noted.