Skip to main content

Command Palette

Search for a command to run...

How a Browser Works: A Beginner-Friendly Guide to Browser Internals

Published
7 min read

Have you thought about what happens when you type a URL and press enter, and then in seconds a webpage appears with all texts, images, colors ,etc. What is going on behind the scenes to transform simple text files into fancy and good-looking UI (user interface)?
In this article we are going to understand how a actually browser works and browser internals.

Topics to Cover

  1. What a browser actually is

  2. Main parts of a browser

  3. User Interface: address bar, tabs, buttons

  4. Browser Engine vs Rendering Engine

  5. Networking: how a browser fetches HTML, CSS, JS

  6. HTML parsing and DOM creation

  7. CSS parsing and CSSOM creation

  8. How DOM and CSSOM come together

What a browser actually is

A browser is software application that retrieves information from the web and interprets it and displays content on your device. In simple words its job is to take a "recipe" (HTML, CSS, and JavaScript) and cook it into a finished "meal" (the interactive website).

Main parts of a browser

A browser consists of several major components like User Interface, Browser Engine, Rendering Engine.

User Interface: address bar, tabs, buttons

User Interface: The UI is visible part of the browser everything except the webpage content itself.

Components of the UI:

  • Address Bar: Where you type web addresses. Modern address bars also do research, show security status and provide suggestions.

  • Navigation Buttons: Back, Forward, refresh,home buttons that control page navigation.

  • Tabs: Allow multiple webpages to be open simultaneously in one window.

  • Browser Menu: Settings,History, downloads, extensions.

  • Bookmarks Bar: Quick access to saved pages.

Browser Engine vs Rendering Engine

Browser Engine: The high-level coordinator. It marshals actions between the UI and the rendering engine. When you click the refresh button in the UI, the browser engine tells the rendering engine to reload the page.

Think of the browser engine as a manager who coordinates between different departments but doesn't do the detailed work itself.

Rendering Engine: The component that actually interprets HTML and CSS, builds the page structure, and displays content.

Think of the rendering engine as the artist who does the actual work of creating the visual representation.

Popular Rendering Engines:

  • Blink: Used by Chrome, Edge, Opera, Brave (most popular)

  • WebKit: Used by Safari

  • Gecko: Used by Firefox

These engines have the same job (render web content) but implement details differently, which is why websites sometimes look or behave slightly differently across browsers.

Networking: How a Browser Fetches HTML, CSS, and JavaScript

Before the browser can display anything, it needs to get the files. This is where the networking component comes in.

The Journey from URL to Files

When you press Enter after typing a URL:

1. DNS Lookup: The browser contacts DNS servers to translate the domain name (example.com) into an IP address (93.184.216.34).

2. TCP Connection: The browser establishes a TCP connection with the server at that IP address (the three-way handshake we discussed in previous posts).

3. HTTP Request: The browser sends an HTTP GET request asking for the webpage:

GET / HTTP/1.1
Host: example.com

4. Server Response: The server sends back HTML:

HTTP/1.1 200 OK
Content-Type: text/html

<!DOCTYPE html>
<html>
  <head>
    <link rel="stylesheet" href="style.css">
  </head>
  <body>
    <h1>Hello World</h1>
    <script src="script.js"></script>
  </body>
</html>

5. Additional Requests: The browser parses this HTML and sees references to CSS (style.css) and JavaScript (script.js). It immediately makes additional HTTP requests to fetch these files.

6. Other Resources: As parsing continues, the browser discovers and fetches images, fonts, videos, and other resources.

Parsing: Breaking Down Code into Understandable Structures

Once the browser has the files, it needs to parse them—break them down into structures it can work with. Let's understand parsing with a simple analogy.

What is Parsing?

Imagine you're given this mathematical expression:

3 + 4 * 5

To evaluate this, your brain doesn't just see characters—it parses the expression into meaning:

  • First, recognize the numbers: 3, 4, 5

  • Then, recognize the operators: +, *

  • Finally, understand the structure (order of operations): multiply first, then add

You might mentally construct a tree:

       +
      / \
     3   *
        / \
       4   5

This tree represents the structure: "Add 3 to the result of (4 times 5)."

Parsing is exactly this: breaking text into tokens (smallest meaningful units) and organizing them into a structure that represents their meaning and relationships.

HTML Parsing and DOM Creation

The DOM (Document Object Model) is a tree structure representing the webpage. It's the browser's internal model of the HTML.

From HTML to DOM

When the browser receives HTML:

html

<!DOCTYPE html>
<html>
  <head>
    <title>My Page</title>
  </head>
  <body>
    <h1>Welcome</h1>
    <p>This is a paragraph.</p>
  </body>
</html>

The parsing process:

  1. Tokenization: Break HTML into tokens (tags, text, attributes)
   <html> → Start tag token
   <head> → Start tag token
   <title> → Start tag token
   "My Page" → Text token
   </title> → End tag token
   ...
  1. Tree Construction: Build a tree structure from these tokens
                    Document
                       |
                    <html>
                    /    \
               <head>    <body>
                 |       /    \
              <title>  <h1>  <p>
                 |      |     |
            "My Page" "Welcome" "This is..."

This tree is the DOM. Each node represents an element or text in the HTML. The structure captures parent-child relationships: <body> is a child of <html>, <h1> is a child of <body>, etc.

Why the DOM Matters

The DOM is crucial because:

  • JavaScript interacts with the DOM: When you write document.querySelector('h1'), you're querying this tree structure

  • The browser uses the DOM for rendering: It traverses this tree to determine what to display

  • It represents document structure: The hierarchy defines which elements contain others

Key Insight: The DOM is not the same as your HTML source code. It's a living, in-memory representation that JavaScript can modify. When you change the DOM with JavaScript, the visual page updates, but your original HTML file doesn't change.

CSS Parsing and CSSOM Creation

Just as HTML becomes the DOM, CSS becomes the CSSOM (CSS Object Model).

From CSS to CSSOM

When the browser receives CSS:

css

body {
  font-size: 16px;
  color: #333;
}

h1 {
  color: blue;
  font-size: 32px;
}

p {
  margin: 10px;
}

The parsing process:

  1. Tokenization: Break CSS into tokens (selectors, properties, values)
   body → Selector
   { → Start block
   font-size → Property
   16px → Value
   ...
  1. CSSOM Tree Construction: Build a tree representing CSS rules
              CSSOM
                |
        ┌───────┼───────┐
        │       │       │
      body     h1       p
        |       |       |
    [styles] [styles] [styles]
   font-size  color   margin
   color    font-size

Cascading and Inheritance

The CSSOM represents not just individual rules but the cascade—how styles are combined and overridden:

  • Browser default styles (base)

  • User styles (if any)

  • Author styles (your CSS)

  • Inline styles (highest priority)

The CSSOM resolves all these to determine the final style for each element. This is why it's called "Cascading" Style Sheets—styles cascade down from general to specific, from parent to child, with later rules overriding earlier ones.

How DOM and CSSOM Come Together: The Render Tree

The browser now has two trees:

  • DOM: The structure and content

  • CSSOM: The styles

To display the page, it combines them into a Render Tree.

Building the Render Tree

The render tree contains only the elements that will be visible on the page, each with its computed styles:

DOM:                    CSSOM:              Render Tree:

<html>                  body {...}          <html> (visible)
  <head>                h1 {...}              <body> (visible, styled)
    <title>...</title>  p {...}                 <h1> (visible, styled)
  <body>                                        <p> (visible, styled)
    <h1>...</h1>
    <p>...</p>


(Invisible elements like <head>, <title>,
 and elements with display:none are excluded)

Key Points:

  1. Only visible elements: <head>, <script>, elements with display: none are excluded from the render tree

  2. Computed styles: Each node has its final, computed styles after cascading, inheritance, and defaults are applied

  3. Visual hierarchy: The tree structure represents what needs to be painted and in what order

Example: If your CSS has:

css

h1 {
  color: blue;
  font-size: 32px;
  display: none; /* This h1 won't be in render tree! */
}

The <h1> element exists in the DOM (JavaScript can still find it) but doesn't appear in the render tree (won't be displayed).