Arman Hossain - Web Scraping & Data Extraction Specialist

The Origin Story

2000 I started programming at age eight with Visual Basic 6. Growing up in rural Bangladesh with limited internet access, I quickly realized that web scraping wasn't just a skill—it was survival. The cost of staying online was simply too high.

2004 I began scraping websites using VB6's MSHTML library. By downloading tutorials and forum content for offline reading, I saved enough money to keep learning. That necessity became a passion—I started exploring every tool I could find: CURL, WGET, sockets, browser macros, and later Selenium.

2008 I started building mobile apps for GetJar and Nokia's Ovi Store using J2ME. By then, I was proficient in Java, PHP, and Python. The transition to Android app development soon followed.

2014 I built an enterprise-level search engine with a web-crawling operation that processed 10-25 million pages daily—handling 4-8 TB of data. Most of the infrastructure ran on Apache Solr, with scrapers written in Python, PHP, and C. Anti-bot detection, CAPTCHAs, rate limiting—I'd learned to work around them all.

2025 To sum it up: I started programming because I wanted to scrape websites. It's been about 25 years now, and I never really stopped. Web scraping and reverse engineering aren't just what I do—they're why I code.

What I Do

Web Scraping & Large-Scale Data Extraction
Browser Automation (Playwright, Puppeteer, Selenium)
Anti-Bot Detection & Circumvention
API Integrations & Monitoring Systems
Data Pipelines & ETL Architecture

Tech Stack

Languages

Python, JavaScript, TypeScript, Ruby, C

Automation

Playwright, Puppeteer, Selenium, Camoufox, Scrapy

Backend

FastAPI, Express, NestJS, Rails

Frontend

Vue, React, Next.js

Databases

PostgreSQL, MongoDB, Redis, Neo4j, Apache Solr

Cloud & DevOps

GCP, AWS, Azure, Docker, Kubernetes

Open Source

httpmorph

100+ stars

Drop-in replacement for Python requests with perfect Chrome fingerprint matching. Bypasses TLS fingerprinting used by Cloudflare and similar services.

Built to solve the growing problem of JA3/JA4 fingerprinting that was blocking legitimate automation.

PythonCBoringSSLTLS Fingerprinting

chromixer

14 stars

Chrome extension for browser fingerprint protection and randomization.

Developed from years of studying how websites track and identify browsers.

JavaScriptChrome ExtensionPrivacy

www2any

4 stars

Web scraper using LLMs to intelligently convert web content into structured formats.

Exploring how AI can make data extraction more adaptive and resilient.

PythonPlaywrightLLM

khudro

18 stars

Lightweight web server built from scratch in C using sockets and threads for concurrent connections.

Built to understand low-level networking—supports 400+ MIME types and configurable buffer limits.

CSocketsMultithreading

Experience

Terminal49Jan 2023 - Present

Senior Integration Engineer

California, USA (Remote)

Building data extraction solutions that connect global shipping carriers. Working with complex integrations—scraping, APIs, and real-time monitoring systems—to aggregate container tracking data from ocean carriers, terminals, and rail providers worldwide.

PythonPlaywrightAnti-Bot BypassAPI IntegrationMonitoring

HelloTaskMay 2022 - Dec 2022

Chief Technology Officer

Dhaka, Bangladesh

Built a job platform for blue-collar workers, removing technological barriers that prevent them from accessing decent employment. Led a team working to change the blue-collar industry through accessible technology.

System ArchitectureMobile DevelopmentData Pipelines

Shakti FoundationJan 2021 - May 2022

Deputy General Manager, IT

Dhaka, Bangladesh

Worked on Fintech solutions for Microfinance and SMEs. Built and deployed mobile apps for the organization's ERP system serving thousands of field agents.

JavaKotlin.NETSQL ServerFintech

Riseup LabsSep 2019 - Dec 2020

Software Engineer / Team Lead

Dhaka, Bangladesh

Performed complete SDLCs for software products. Collaborated with organizations including UNICEF, FAO, Ministry of ICT, Ministry of Health, and Ministry of Education on data-driven projects.

PythonData MiningBigQueryGovernment Projects

AppconSoftNov 2014 - Apr 2017

Software Developer

Dhaka, Bangladesh

Built the Akor search engine with enterprise-level web crawlers using Elastic Search and Apache Solr. Processed 10-25 million pages daily with custom scrapers in Python, PHP, and C.

Apache SolrElasticsearchWeb CrawlingSelenium

AppsDroneFeb 2013 - Oct 2014

Programmer

Dhaka, Bangladesh

My first professional role. Developed Android App Marketplace Analytics platform with Play Store scraping and data mining capabilities.

Data MiningPlay Store ScrapingAnalytics

Education

IU International University

M.Sc. Artificial Intelligence

2024 - 2026

Aarhus University

Master's, Computer Engineering

2024

Shanto-Mariam University

B.Sc. Computer Science

2016 - 2020

Writing

ByteTunnels

bytetunnels.com

Technical deep-dives on web scraping, anti-bot systems, browser fingerprinting, and automation techniques.

Web ScrapingAnti-DetectionPython

Medium

arman-bd.medium.com

Articles on software development, automation challenges, and engineering insights.

Technical WritingTutorials

Bored?

View this page from your terminal or IDE.

arman@scraper:~/arman-bd.github.io$

curl -s https://arman-bd.github.io | sed '/<pre/,/<\/pre>/d' | grep 'data-cli=' | sed -n \
  -e 's/.*data-cli="name">\([^<]*\).*/\x1b[32m============\n\1\n============\x1b[0m/p' \
  -e 's/.*data-cli="title">\([^<]*\).*/\x1b[36m\1\x1b[0m/p' \
  -e 's/.*data-cli="tagline" data-text="\([^"]*\)".*/\x1b[90m\1\x1b[0m/p' \
  -e 's/.*data-cli="link" data-label="\([^"]*\)" href="mailto:\([^"]*\)".*/  \1: \2/p' \
  -e 's/.*data-cli="link" data-label="\([^"]*\)" href="\([^"]*\)".*/  \1: \2/p' \
  -e 's/.*data-cli="section">\([^<]*\).*/\n\x1b[33m[\1]\x1b[0m/p' \
  -e 's/.*data-cli="story" data-year="\([^"]*\)" data-text="\([^"]*\)".*/  \x1b[36m\1:\x1b[0m \2/p' \
  -e 's/.*data-cli="item">\([^<]*\).*/  - \1/p' \
  -e 's/.*data-cli="project" data-name="\([^"]*\)" data-stars="\([^"]*\)" href="\([^"]*\)".*/  \x1b[36m* \1 (\2): \3\x1b[0m/p' \
  -e 's/.*data-cli="exp" data-info="\([^"]*\)".*/  \x1b[1m> \1\x1b[0m/p' \
  -e 's/.*data-cli="edu" data-info="\([^"]*\)".*/  \x1b[1m> \1\x1b[0m/p' \
  -e 's/.*data-cli="blog" data-name="\([^"]*\)" href="\([^"]*\)".*/  \x1b[36m* \1: \2\x1b[0m/p' \
  -e 's/.*data-cli="footer">\([^<]*\).*/\n\x1b[90m\1\x1b[0m/p'

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://arman-bd.github.io").text, "html.parser")
[pre.decompose() for pre in soup.find_all("pre")]

for el in soup.find_all(attrs={"data-cli": True}):
    t, v = el["data-cli"], el.get_text(strip=True)
    if t == "name": print(f"\033[32m{'='*12}\n{v}\n{'='*12}\033[0m")
    elif t == "title": print(f"\033[36m {v}\033[0m")
    elif t == "tagline": print(f"\033[90m {el.get('data-text', v)}\033[0m")
    elif t == "link": print(f"  {el['data-label']}: {el['href'].replace('mailto:','')}")
    elif t == "section": print(f"\n\033[33m[{v}]\033[0m")
    elif t == "story": print(f"  \033[36m{el['data-year']}:\033[0m {el['data-text']}")
    elif t == "item": print(f"  - {v}")
    elif t == "project": print(f"  \033[36m* {el['data-name']} ({el['data-stars']}): {el['href']}\033[0m")
    elif t == "exp": print(f"  \033[1m> {el['data-info']}\033[0m")
    elif t == "edu": print(f"  \033[1m> {el['data-info']}\033[0m")
    elif t == "blog": print(f"  \033[36m* {el['data-name']}: {el['href']}\033[0m")
    elif t == "footer": print(f"\n\033[90m{v}\033[0m")

const cheerio = require("cheerio");
const https = require("https");

https.get("https://arman-bd.github.io", (res) => {
  let html = "";
  res.on("data", (chunk) => (html += chunk));
  res.on("end", () => {
    const $ = cheerio.load(html);
    $("pre").remove();

    $("[data-cli]").each((_, el) => {
      const $el = $(el);
      const t = $el.attr("data-cli");
      const v = $el.text().trim();

      if (t === "name") console.log(`\x1b[32m${"=".repeat(12)}\n${v}\n${"=".repeat(12)}\x1b[0m`);
      else if (t === "title") console.log(`\x1b[36m ${v}\x1b[0m`);
      else if (t === "tagline") console.log(`\x1b[90m ${$el.attr("data-text") || v}\x1b[0m`);
      else if (t === "link") console.log(`  ${$el.attr("data-label")}: ${$el.attr("href").replace("mailto:","")}`);
      else if (t === "section") console.log(`\n\x1b[33m[${v}]\x1b[0m`);
      else if (t === "story") console.log(`  \x1b[36m${$el.attr("data-year")}:\x1b[0m ${$el.attr("data-text")}`);
      else if (t === "item") console.log(`  - ${v}`);
      else if (t === "project") console.log(`  \x1b[36m* ${$el.attr("data-name")} (${$el.attr("data-stars")}): ${$el.attr("href")}\x1b[0m`);
      else if (t === "exp") console.log(`  \x1b[1m> ${$el.attr("data-info")}\x1b[0m`);
      else if (t === "edu") console.log(`  \x1b[1m> ${$el.attr("data-info")}\x1b[0m`);
      else if (t === "blog") console.log(`  \x1b[36m* ${$el.attr("data-name")}: ${$el.attr("href")}\x1b[0m`);
      else if (t === "footer") console.log(`\n\x1b[90m${v}\x1b[0m`);
    });
  });
});

require "net/http"
require "nokogiri"

doc = Nokogiri::HTML(Net::HTTP.get(URI("https://arman-bd.github.io")))
doc.css("pre").remove

doc.css("[data-cli]").each do |el|
  t = el["data-cli"]
  v = el.text.strip

  case t
  when "name" then puts "\e[32m#{"=" * 12}\n#{v}\n#{"=" * 12}\e[0m"
  when "title" then puts "\e[36m #{v}\e[0m"
  when "tagline" then puts "\e[90m #{el["data-text"] || v}\e[0m"
  when "link" then puts "  #{el["data-label"]}: #{el["href"].sub("mailto:", "")}"
  when "section" then puts "\n\e[33m[#{v}]\e[0m"
  when "story" then puts "  \e[36m#{el["data-year"]}:\e[0m #{el["data-text"]}"
  when "item" then puts "  - #{v}"
  when "project" then puts "  \e[36m* #{el["data-name"]} (#{el["data-stars"]}): #{el["href"]}\e[0m"
  when "exp" then puts "  \e[1m> #{el["data-info"]}\e[0m"
  when "edu" then puts "  \e[1m> #{el["data-info"]}\e[0m"
  when "blog" then puts "  \e[36m* #{el["data-name"]}: #{el["href"]}\e[0m"
  when "footer" then puts "\n\e[90m#{v}\e[0m"
  end
end