Arman Hossain

Web Scraping & Data Extraction Specialist

"If humans can access it, bots can too.
Just takes some coffee and determination."

Email LinkedIn GitHub Blog

The Origin Story

2000 I started programming at age eight with Visual Basic 6. Growing up in rural Bangladesh with limited internet access, I quickly realized that web scraping wasn't just a skill—it was survival. The cost of staying online was simply too high.

2004 I began scraping websites using VB6's MSHTML library. By downloading tutorials and forum content for offline reading, I saved enough money to keep learning. That necessity became a passion—I started exploring every tool I could find: CURL, WGET, sockets, browser macros, and later Selenium.

2008 I started building mobile apps for GetJar and Nokia's Ovi Store using J2ME. By then, I was proficient in Java, PHP, and Python. The transition to Android app development soon followed.

2014 I built an enterprise-level search engine with a web-crawling operation that processed 10-25 million pages daily—handling 4-8 TB of data. Most of the infrastructure ran on Apache Solr, with scrapers written in Python, PHP, and C. Anti-bot detection, CAPTCHAs, rate limiting—I'd learned to work around them all.

2025 To sum it up: I started programming because I wanted to scrape websites. It's been about 25 years now, and I never really stopped. Web scraping and reverse engineering aren't just what I do—they're why I code.

What I Do

  • Web Scraping & Large-Scale Data Extraction
  • Browser Automation (Playwright, Puppeteer, Selenium)
  • Anti-Bot Detection & Circumvention
  • API Integrations & Monitoring Systems
  • Data Pipelines & ETL Architecture

Tech Stack

Languages

Python, JavaScript, TypeScript, Ruby, C

Automation

Playwright, Puppeteer, Selenium, Camoufox, Scrapy

Backend

FastAPI, Express, NestJS, Rails

Frontend

Vue, React, Next.js

Databases

PostgreSQL, MongoDB, Redis, Neo4j, Apache Solr

Cloud & DevOps

GCP, AWS, Azure, Docker, Kubernetes

Open Source

Experience

Terminal49Jan 2023 - Present
Senior Integration Engineer
California, USA (Remote)

Building data extraction solutions that connect global shipping carriers. Working with complex integrations—scraping, APIs, and real-time monitoring systems—to aggregate container tracking data from ocean carriers, terminals, and rail providers worldwide.

PythonPlaywrightAnti-Bot BypassAPI IntegrationMonitoring
HelloTaskMay 2022 - Dec 2022
Chief Technology Officer
Dhaka, Bangladesh

Built a job platform for blue-collar workers, removing technological barriers that prevent them from accessing decent employment. Led a team working to change the blue-collar industry through accessible technology.

System ArchitectureMobile DevelopmentData Pipelines
Shakti FoundationJan 2021 - May 2022
Deputy General Manager, IT
Dhaka, Bangladesh

Worked on Fintech solutions for Microfinance and SMEs. Built and deployed mobile apps for the organization's ERP system serving thousands of field agents.

JavaKotlin.NETSQL ServerFintech
Riseup LabsSep 2019 - Dec 2020
Software Engineer / Team Lead
Dhaka, Bangladesh

Performed complete SDLCs for software products. Collaborated with organizations including UNICEF, FAO, Ministry of ICT, Ministry of Health, and Ministry of Education on data-driven projects.

PythonData MiningBigQueryGovernment Projects
AppconSoftNov 2014 - Apr 2017
Software Developer
Dhaka, Bangladesh

Built the Akor search engine with enterprise-level web crawlers using Elastic Search and Apache Solr. Processed 10-25 million pages daily with custom scrapers in Python, PHP, and C.

Apache SolrElasticsearchWeb CrawlingSelenium
AppsDroneFeb 2013 - Oct 2014
Programmer
Dhaka, Bangladesh

My first professional role. Developed Android App Marketplace Analytics platform with Play Store scraping and data mining capabilities.

Data MiningPlay Store ScrapingAnalytics

Education

IU International University
M.Sc. Artificial Intelligence
2024 - 2026
Aarhus University
Master's, Computer Engineering
2024
Shanto-Mariam University
B.Sc. Computer Science
2016 - 2020

Writing

Bored?

View this page from your terminal or IDE.

arman@scraper:~/arman-bd.github.io$
curl -s https://arman-bd.github.io | sed '/<pre/,/<\/pre>/d' | grep 'data-cli=' | sed -n \
  -e 's/.*data-cli="name">\([^<]*\).*/\x1b[32m============\n\1\n============\x1b[0m/p' \
  -e 's/.*data-cli="title">\([^<]*\).*/\x1b[36m\1\x1b[0m/p' \
  -e 's/.*data-cli="tagline" data-text="\([^"]*\)".*/\x1b[90m\1\x1b[0m/p' \
  -e 's/.*data-cli="link" data-label="\([^"]*\)" href="mailto:\([^"]*\)".*/  \1: \2/p' \
  -e 's/.*data-cli="link" data-label="\([^"]*\)" href="\([^"]*\)".*/  \1: \2/p' \
  -e 's/.*data-cli="section">\([^<]*\).*/\n\x1b[33m[\1]\x1b[0m/p' \
  -e 's/.*data-cli="story" data-year="\([^"]*\)" data-text="\([^"]*\)".*/  \x1b[36m\1:\x1b[0m \2/p' \
  -e 's/.*data-cli="item">\([^<]*\).*/  - \1/p' \
  -e 's/.*data-cli="project" data-name="\([^"]*\)" data-stars="\([^"]*\)" href="\([^"]*\)".*/  \x1b[36m* \1 (\2): \3\x1b[0m/p' \
  -e 's/.*data-cli="exp" data-info="\([^"]*\)".*/  \x1b[1m> \1\x1b[0m/p' \
  -e 's/.*data-cli="edu" data-info="\([^"]*\)".*/  \x1b[1m> \1\x1b[0m/p' \
  -e 's/.*data-cli="blog" data-name="\([^"]*\)" href="\([^"]*\)".*/  \x1b[36m* \1: \2\x1b[0m/p' \
  -e 's/.*data-cli="footer">\([^<]*\).*/\n\x1b[90m\1\x1b[0m/p'
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://arman-bd.github.io").text, "html.parser")
[pre.decompose() for pre in soup.find_all("pre")]

for el in soup.find_all(attrs={"data-cli": True}):
    t, v = el["data-cli"], el.get_text(strip=True)
    if t == "name": print(f"\033[32m{'='*12}\n{v}\n{'='*12}\033[0m")
    elif t == "title": print(f"\033[36m {v}\033[0m")
    elif t == "tagline": print(f"\033[90m {el.get('data-text', v)}\033[0m")
    elif t == "link": print(f"  {el['data-label']}: {el['href'].replace('mailto:','')}")
    elif t == "section": print(f"\n\033[33m[{v}]\033[0m")
    elif t == "story": print(f"  \033[36m{el['data-year']}:\033[0m {el['data-text']}")
    elif t == "item": print(f"  - {v}")
    elif t == "project": print(f"  \033[36m* {el['data-name']} ({el['data-stars']}): {el['href']}\033[0m")
    elif t == "exp": print(f"  \033[1m> {el['data-info']}\033[0m")
    elif t == "edu": print(f"  \033[1m> {el['data-info']}\033[0m")
    elif t == "blog": print(f"  \033[36m* {el['data-name']}: {el['href']}\033[0m")
    elif t == "footer": print(f"\n\033[90m{v}\033[0m")
const cheerio = require("cheerio");
const https = require("https");

https.get("https://arman-bd.github.io", (res) => {
  let html = "";
  res.on("data", (chunk) => (html += chunk));
  res.on("end", () => {
    const $ = cheerio.load(html);
    $("pre").remove();

    $("[data-cli]").each((_, el) => {
      const $el = $(el);
      const t = $el.attr("data-cli");
      const v = $el.text().trim();

      if (t === "name") console.log(`\x1b[32m${"=".repeat(12)}\n${v}\n${"=".repeat(12)}\x1b[0m`);
      else if (t === "title") console.log(`\x1b[36m ${v}\x1b[0m`);
      else if (t === "tagline") console.log(`\x1b[90m ${$el.attr("data-text") || v}\x1b[0m`);
      else if (t === "link") console.log(`  ${$el.attr("data-label")}: ${$el.attr("href").replace("mailto:","")}`);
      else if (t === "section") console.log(`\n\x1b[33m[${v}]\x1b[0m`);
      else if (t === "story") console.log(`  \x1b[36m${$el.attr("data-year")}:\x1b[0m ${$el.attr("data-text")}`);
      else if (t === "item") console.log(`  - ${v}`);
      else if (t === "project") console.log(`  \x1b[36m* ${$el.attr("data-name")} (${$el.attr("data-stars")}): ${$el.attr("href")}\x1b[0m`);
      else if (t === "exp") console.log(`  \x1b[1m> ${$el.attr("data-info")}\x1b[0m`);
      else if (t === "edu") console.log(`  \x1b[1m> ${$el.attr("data-info")}\x1b[0m`);
      else if (t === "blog") console.log(`  \x1b[36m* ${$el.attr("data-name")}: ${$el.attr("href")}\x1b[0m`);
      else if (t === "footer") console.log(`\n\x1b[90m${v}\x1b[0m`);
    });
  });
});
require "net/http"
require "nokogiri"

doc = Nokogiri::HTML(Net::HTTP.get(URI("https://arman-bd.github.io")))
doc.css("pre").remove

doc.css("[data-cli]").each do |el|
  t = el["data-cli"]
  v = el.text.strip

  case t
  when "name" then puts "\e[32m#{"=" * 12}\n#{v}\n#{"=" * 12}\e[0m"
  when "title" then puts "\e[36m #{v}\e[0m"
  when "tagline" then puts "\e[90m #{el["data-text"] || v}\e[0m"
  when "link" then puts "  #{el["data-label"]}: #{el["href"].sub("mailto:", "")}"
  when "section" then puts "\n\e[33m[#{v}]\e[0m"
  when "story" then puts "  \e[36m#{el["data-year"]}:\e[0m #{el["data-text"]}"
  when "item" then puts "  - #{v}"
  when "project" then puts "  \e[36m* #{el["data-name"]} (#{el["data-stars"]}): #{el["href"]}\e[0m"
  when "exp" then puts "  \e[1m> #{el["data-info"]}\e[0m"
  when "edu" then puts "  \e[1m> #{el["data-info"]}\e[0m"
  when "blog" then puts "  \e[36m* #{el["data-name"]}: #{el["href"]}\e[0m"
  when "footer" then puts "\n\e[90m#{v}\e[0m"
  end
end