Configuration Management (Django)

Effective management of scraping targets, keywords, and settings is crucial for controlling the system's behavior and adapting to new requirements.

Role & Purpose

Acts as the central repository for all operational parameters guiding the scraping process.
Provides an interface for users/administrators to define what to scrape, where to scrape it from, and how.
Stores metadata about target websites needed for successful scraping (e.g., requires Playwright, specific selectors, date filter parameters).

Technology Choice: Django + PostgreSQL

Django Framework: Chosen primarily for its outstanding built-in Admin Interface. This automatically generates a web UI for managing database models, drastically reducing the effort needed to build a configuration tool.
PostgreSQL Database: Serves as the backend database storing the configuration data. Chosen for reliability, robustness, and compatibility with Django.

classDiagram
    class Website {
        +int id PK
        +string name
        +string base_url
        +string search_url_template
        +bool requires_playwright
        +string pagination_type
        +bool is_active
        +float base_delay
        +json selectors (nullable)
        +scrape_targets : list~ScrapeTarget~
    }

    class Keyword {
        +int id PK
        +string text
        +string type (nullable)
        +scrape_targets : list~ScrapeTarget~
    }

    class Location {
        +int id PK
        +string text
        +scrape_targets : list~ScrapeTarget~
    }

    class ScrapeTarget {
        +int id PK
        +bool is_active
        +string frequency
        +datetime last_scheduled (nullable)
        +website : Website
        +keyword : Keyword
        +location : Location
        # Represents the specific combination to scrape
        # Implicit FKs: website_id, keyword_id, location_id
    }

    Website "1" -- "*" ScrapeTarget : defines >
    Keyword "1" -- "*" ScrapeTarget : uses >
    Location "1" -- "*" ScrapeTarget : specifies >

Key Data Models (Conceptual)

The Django application would define models roughly corresponding to:

`Website` Model

name: Human-readable name (e.g., "LinkedIn Jobs").
base_url: Base URL of the site.
search_url_template: URL structure for performing searches (with placeholders for keywords, location, etc.).
robots_txt_status: (e.g., Allowed, Disallowed, Not Checked).
requires_playwright: Boolean flag indicating if browser automation is needed.
pagination_type: (e.g., 'next_link', 'load_more_button', 'infinite_scroll', 'api').
date_filter_param: Specific URL parameter or interaction logic for date filtering.
selectors: (Potentially JSON/TextField) Storing CSS/XPath selectors for key elements (job links, title, description - though parsing logic might be separate).
is_active: Boolean flag to easily enable/disable scraping for this site.
base_delay: Default download delay for this site.
... other site-specific metadata.

`Keyword` Model

text: The keyword/phrase to search for (e.g., "Data Engineer").
type: (Optional) Category like 'Role', 'Skill', 'Tool'.

`Location` Model

text: The location to search within (e.g., "London, UK", "Remote").

`ScrapeTarget` Model (Linking Table)

website: Foreign Key to Website.
keyword: Foreign Key to Keyword.
location: Foreign Key to Location.
is_active: Boolean flag for this specific combination.
frequency: How often to schedule this target (e.g., 'daily', 'hourly').
last_scheduled: Timestamp of the last dispatch.

Benefits of this Approach

Centralized Control: All configuration in one place.
User-Friendly Management: Easy updates via the Django Admin UI without direct database access or code changes for simple config updates.
Structured & Relational: Database enforces relationships (e.g., which keywords apply to which sites).
Version Control: Django migrations allow schema changes to be version controlled.
Programmatic Access: The Dispatcher service can easily query these models using Django's ORM.

Configuration Management (Django)

Role & Purpose

Technology Choice: Django + PostgreSQL

Key Data Models (Conceptual)

Website Model

Keyword Model

Location Model

ScrapeTarget Model (Linking Table)

Benefits of this Approach

`Website` Model

`Keyword` Model

`Location` Model

`ScrapeTarget` Model (Linking Table)