Gig Board

RFP number

ACS404

Date posted

2026-05-06 15:11:10

Timeline start date

2026-05-11

Timeline end date

2026-05-30

Gig Marketplace partnership

AC:Studio

Gig category

Software Developer

Budget

$2700

Company Name

LandLogic solutions Inc.

Project overview

Building Code Markdown Extraction Engine The objective of this project is to develop an automated data extraction pipeline that converts complex, multi-jurisdictional building code documents—including the Ontario Building Code, the National Building Code of Canada, and various provincial codes—into a structured, searchable library of Markdown files. By segmenting these codes into individual files based on their internal hierarchy (Sections, Parts, and Clauses), we aim to create a machine-readable dataset optimized for documentation, version control, and integration into LLM-based tools or internal knowledge bases.

Scope of work

1. Discovery & Document Mapping Analyze the structural layout of the Ontario Building Code, National Building Code, and other provincial codes provided. Define a standardized Markdown schema that accommodates "Parts," "Sections," and "Clauses" across different jurisdictions. 2. Development of Extraction Engine Build a script (preferably Python) to handle high-fidelity PDF-to-Text conversion. Implement logic to preserve complex elements, specifically tables, bulleted lists, and mathematical formulas. Develop an automated "splitting" mechanism that creates a new Markdown file for each logical section of the code. 3. Data Formatting & Cleaning Convert all extracted text into clean Markdown syntax. Insert YAML front-matter or metadata headers into each file (e.g., Province: Ontario, Year: 2024, Section: 3.2.1). Ensure all cross-references within the text (e.g., "See Section 4.1.2") are formatted for future hyperlinking. 4. Quality Assurance & Validation Run a verification report to confirm 100% coverage of the source documents (no missing sections). Perform a manual audit of a sample set (e.g., 5-10 sections) to ensure formatting accuracy against the original PDFs. 5. Delivery & Handoff Provide a organized folder structure containing all generated .md files. Deliver the source code for the extraction tool along with a brief README.md explaining how to run it on future PDF updates.

Preferred Skills or Experience

- Data Science - Software Development - GIS Data - experience with bylaws and code documents like building codes

Submission details

Awarded