A gated, deep-technical path — from raw fundamentals to a data engine scraping 50+ job platforms, ending with your code live on rozgar.codeaza.org.
One week to become dangerous across the whole stack. No passive tutorials — each day ends with a committed, running artifact. By Friday: a Next.js frontend calling a typed FastAPI backend, persisting to Postgres, populated with data you scraped yourself.
mypy, dataclasses vs Pydantic models, async/await and the event loop, generators, context managers, uv for envs, ruff for lint/format. Ship a typed async CLI.httpx sync/async, HTML parsing with selectolax/BeautifulSoup, CSS/XPath selectors, robots.txt & rate limits, Playwright for JS pages.You don't write 50 throwaway scripts — you design a framework. A BaseScraper contract, a normalized Job schema every source maps into, a source registry, and a raw→parsed two-layer store. Adding platform #51 becomes a config + adapter, not a rewrite.
BaseScraper ABC: fetch() → parse() → normalize() → validate() lifecycleJob Pydantic model: title, org, location, salary_min/max, category, employment_type, source, source_url, posted_at, deadline, raw_hashraw_pages + typed jobs, so re-parsing never re-fetchesdateparser, regex + heuristics)Push past 50 sources, including JS-rendered boards and anti-bot walls. This is where you learn scraping at scale is a reliability problem, not a parsing one — plus deduplication so the same job across 5 boards collapses to one clean record.
tenacity), proxy rotation, polite concurrency capsstructlog) with per-run, per-source correlation IDsraw_hash + canonical URLrapidfuzz) + content shinglingThe pipeline can't live on a laptop. It becomes infrastructure: scheduled, parallelised, containerised, and fronted by a clean read API that Rozgar's frontend will consume. This is the handoff from data engine to product backend.
GET /jobs with filters (category, location, gov/private, remote, deadline), cursor pagination, sortpg_trgm, tsvector) over title/org/descriptionBuild the core of rozgar.codeaza.org: the job feed. Fast search, filters a real job-seeker understands, infinite scroll, and SEO-friendly detail pages — all served from your week-4 API.
The feature that makes Rozgar sticky. A user saves a search (“BPS-17 govt jobs in Punjab”) and gets pinged the moment a match is scraped. This closes the loop: scraper → dedup → match → alert → user opens Rozgar.
Real products don't fall over. Make Rozgar fast, monitored, and trustworthy: caching, query tuning, error tracking, tests on the critical paths, CI, and a data-quality dashboard so pipeline health is visible at a glance.
Final week. Ship to real users, pick one metric you own (activation, alert open-rate, jobs indexed), move it, and present the full 8 weeks to the team like an engineer defending real work.
Each phase has a hard gate. Miss it and we course-correct fast — nobody drifts for 8 weeks. A strong finish converts to a full-time offer.