News Dashboard Architecture
This project is a self-hosted, open-source news dashboard. It collects curated technical news feeds, stores articles, lets the owner triage them as new, read, saved, skipped, or archived, tracks source health, and can send a daily digest email.
The system is not a large microservice platform. It is best understood as a modular monolith with a small number of runtime units:
news-dashboard: the FastAPI backend, which also serves the built React frontend in container and Kubernetes deployments.postgres: the durable production database.news-dashboard-ingest: a Kubernetes CronJob batch workload that runs ingestion on a schedule.- Optional external integrations: RSS/Atom feeds, a scraped Anthropic News page, SMTP, GHCR, GitHub Actions, Keycloak SSO, and host-level Caddy.
Database Contract
PostgreSQL is the application database. Runtime code should be written directly for PostgreSQL and psycopg:
- Use
%sparameters, PostgreSQL functions/operators, andON CONFLICTupserts. - Do not add SQLite fallbacks, database-type sniffing, placeholder translation, or generic multi-database SQL.
- Configure the app with
DATABASE_URLorPOSTGRES_HOST,POSTGRES_PORT,POSTGRES_DB,POSTGRES_USER, andPOSTGRES_PASSWORD. - SQLite is allowed only as an input format for legacy migration tooling that imports old local data into PostgreSQL.
Runtime Topology
flowchart TB
User["User / Browser"]
subgraph LocalDev["Local development"]
Vite["Vite React dev server<br/>localhost:5173"]
FastAPIDev["FastAPI backend<br/>uvicorn news_dashboard.main:app"]
PostgresDev["PostgreSQL<br/>DATABASE_URL or POSTGRES_*"]
end
subgraph Container["Docker / Production image"]
App["news-dashboard container<br/>FastAPI + built React static files<br/>port 8080"]
end
subgraph Kubernetes["Kubernetes via Helm"]
Caddy["Host Caddy<br/>news.lihor.ro<br/>reverse proxy"]
Keycloak["Keycloak<br/>optional SSO under /keycloak"]
Service["K8s Service<br/>NodePort 30088 or ClusterIP"]
Deployment["news-dashboard Deployment<br/>replicas: 1"]
CronJob["K8s CronJob<br/>news-dashboard ingest<br/>every 6 hours"]
PostgresSvc["Postgres Service"]
Postgres["Postgres StatefulSet<br/>postgres:16-alpine"]
PVC["HostPath / PVC storage"]
end
subgraph External["External systems"]
Feeds["RSS / Atom feeds<br/>OpenAI, Python, GitHub, HN, etc."]
Scraped["Scraped page<br/>Anthropic News"]
SMTP["SMTP server<br/>optional digest email"]
GHCR["GitHub Container Registry"]
Actions["GitHub Actions CI/CD"]
end
User --> Vite
Vite --> FastAPIDev
FastAPIDev --> PostgresDev
User --> Caddy
Caddy --> Service --> Deployment
Caddy --> Keycloak
Deployment --> App
App --> PostgresSvc --> Postgres --> PVC
CronJob --> PostgresSvc
CronJob --> Feeds
CronJob --> Scraped
App --> Feeds
App --> Scraped
App --> SMTP
Actions --> GHCR
GHCR --> Deployment
Application Modules
flowchart LR
Browser["React UI<br/>frontend/src/App.tsx"]
APIClient["API client<br/>frontend/src/api.ts"]
subgraph Backend["Python backend: backend/news_dashboard"]
Main["main.py<br/>FastAPI routes + static frontend"]
Ingest["ingest.py<br/>feed parsing, tagging, scoring, insert"]
Scraper["scraper.py<br/>custom scraped-page handlers"]
Sources["sources.py<br/>curated source registry"]
DB["db.py<br/>PostgreSQL connection + schema"]
Scheduler["scheduler.py<br/>APScheduler ingest + digest jobs"]
Digest["digest.py<br/>daily email + mark-read token"]
CLI["cli.py<br/>init, ingest, articles"]
Migrate["migrate.py<br/>SQLite to Postgres migration"]
end
Database["articles + sources tables<br/>PostgreSQL"]
Browser --> APIClient --> Main
Main --> Ingest
Main --> DB
Main --> Scheduler
Main --> Digest
Scheduler --> Ingest
Scheduler --> Digest
CLI --> Ingest
CLI --> Sources
Ingest --> Sources
Ingest --> Scraper
Ingest --> DB
Digest --> DB
DB --> Database
Article Ingestion Flow
sequenceDiagram
participant Trigger as Trigger<br/>Fetch button / Scheduler / CronJob / CLI
participant API as FastAPI / CLI
participant Ingest as ingest_all()
participant Sources as DEFAULT_SOURCES
participant Feed as RSS/Atom or Scraper
participant DB as PostgreSQL
participant UI as React UI
Trigger->>API: POST /api/ingest or news-dashboard ingest
API->>Ingest: ingest_all()
Ingest->>DB: init_db()
Ingest->>DB: sync_sources()
Ingest->>Sources: iterate curated sources
loop each source
Ingest->>Feed: fetch RSS/Atom or scrape page
Feed-->>Ingest: entries
Ingest->>Ingest: clean HTML
Ingest->>Ingest: canonicalize URL
Ingest->>Ingest: infer tags
Ingest->>Ingest: create summary/reason
Ingest->>Ingest: calculate importance_score
Ingest->>DB: INSERT article ON CONFLICT/IGNORE
Ingest->>DB: update source health
end
Ingest-->>API: per-source inserted counts
API-->>Trigger: inserted total + result map
UI->>API: GET /api/articles, /api/summary, /api/sources
API->>DB: query latest data
API-->>UI: articles, counts, source health
User Flow
flowchart TD
Open["Open news.lihor.ro"] --> Load["React loads dashboard"]
Load --> Summary["GET /api/summary"]
Load --> Articles["GET /api/articles?status=new"]
Load --> Sources["GET /api/sources"]
Articles --> Inbox["Inbox tab"]
Summary --> Counts["Tab counts"]
Sources --> SourcePanel["Sources health panel"]
Inbox --> Action{"User action"}
Action --> Read["Mark read"]
Action --> Save["Save"]
Action --> Skip["Skip"]
Action --> Archive["Archive"]
Read --> Patch["PATCH /api/articles/:id/status"]
Save --> Patch
Skip --> Patch
Archive --> Patch
Patch --> DB["Update article status<br/>and timestamp"]
DB --> Reload["Reload articles + summary"]
Inbox --> Search["Search box"]
Search --> SearchAPI["GET /api/search?q=..."]
SearchAPI --> Results["Results across all statuses"]
Inbox --> Fetch["Fetch now button"]
Fetch --> Ingest["POST /api/ingest"]
Ingest --> Reload
CI/CD Flow
flowchart LR
Dev["Push to main / PR"] --> CI["GitHub Actions"]
CI --> Tests["Python tests<br/>pytest"]
CI --> BuildFrontend["npm run build<br/>TypeScript + Vite"]
Tests --> Publish{"main branch?"}
BuildFrontend --> Publish
Publish --> Image["Docker build<br/>React build stage + Python runtime"]
Image --> GHCR["Push ghcr.io/lihor-hub/news-dashboard:<sha>"]
GHCR --> Runner["Self-hosted runner<br/>mini PC"]
Runner --> Pull["docker pull image"]
Runner --> Secret["kubectl create/update<br/>GHCR pull secret"]
Runner --> Helm["helm upgrade --install"]
Helm --> K8s["Kubernetes namespace<br/>news-dashboard"]
K8s --> Smoke["curl localhost:30088/api/health"]
Important Files
frontend/src/App.tsx: the React dashboard, tabs, filters, article cards, source health panel, search, and manual fetch button.frontend/src/api.ts: browser-side API wrapper around/api/....backend/news_dashboard/main.py: FastAPI app, API routes, startup/shutdown hooks, and static frontend serving.backend/news_dashboard/ingest.py: ingestion pipeline, URL canonicalization, source health updates, summaries, tags, and scoring.backend/news_dashboard/sources.py: curated source registry.backend/news_dashboard/scraper.py: custom scraped-page handlers, currently for Anthropic News.backend/news_dashboard/db.py: PostgreSQL configuration, psycopg connection handling, and schema setup.backend/news_dashboard/scheduler.py: in-process APScheduler jobs for ingest and digest.backend/news_dashboard/digest.py: daily digest email and signed mark-read links.backend/news_dashboard/cli.py: maintenance commands.Dockerfile: multi-stage build, React frontend first, Python runtime second.docker-compose.yml: local container topology with app plus Postgres.helm/news-dashboard: Kubernetes Deployment, Service, CronJob, Postgres StatefulSet, secrets, and storage..github/workflows/ci.yml: tests, frontend build, image publish, and mini PC deployment.
Database Model
The database has two main tables:
sources: source registry and health information, includinglast_checked_at,last_success_at,last_error,last_fetched_count, andlast_inserted_count.articles: normalized article records, including source metadata, category, kind, publication/discovery timestamps, status, importance score, summary, reason, tags, and status-specific timestamps.
PostgreSQL adds generated tsvector columns and GIN indexes in backend/news_dashboard/db.py. User-facing search uses PostgreSQL-native predicates today, with the generated full-text index available for ranked search behavior without adding another runtime database.
How It Works
On startup, FastAPI syncs the configured sources into the database and starts the background scheduler. The scheduler periodically calls the same ingestion pipeline used by the manual Fetch now button and the CLI. In Kubernetes, a separate CronJob also runs news-dashboard ingest every six hours.
During ingestion, each source is fetched through either feedparser for RSS/Atom feeds or a custom scraper for sources that do not expose a feed. Entries are cleaned, URLs are canonicalized to remove tracking parameters, tags are inferred from keywords, summaries and reasons are generated from available text, and rows are inserted if the URL is new. The source row is then updated with success or error health information.
The React UI reads articles by status and category, displays summary counts, shows source health, and lets the user update article status. Status changes are persisted through PATCH /api/articles/{article_id}/status, and the UI reloads articles and counts afterward. Search calls /api/search and returns matching articles across statuses.
For production, GitHub Actions tests the Python backend, builds the frontend, builds a Docker image, pushes it to GHCR, and deploys it on a self-hosted runner with Helm. The host-level Caddy route in deploy/Caddyfile exposes the Kubernetes NodePort at news.lihor.ro and proxies /keycloak to the colocated identity provider. Authentication is enforced by the FastAPI app through local password sessions or optional Keycloak SSO; see Authentication (Keycloak).
Operational Notes
- PostgreSQL is required in every runtime environment. SQLite is only a legacy migration input for importing old local data into PostgreSQL.
- The React app is served separately only in local development. In the production image, the built frontend is served by FastAPI.
- There are two scheduling mechanisms: in-process APScheduler and the Kubernetes CronJob. If duplicate ingestion is undesirable, configure one of them as the authoritative scheduler.
- Authentication is handled by the app. Local password login is always part of the app model, and production can enable Keycloak SSO with
KEYCLOAK_AUTH_ENABLED=1plus the relatedKEYCLOAK_*settings documented inREADME.mdand Authentication (Keycloak).