Retrieving Open Graph (OG) tags from an HTML document in Java

Retrieving Open Graph (OG) tags from an HTML document in Java typically involves fetching the HTML content of a web page and then parsing it to extract the <meta> tags with the property attribute starting with og:. Here's how you can do it step by step:

1. Add Required Dependencies

You can use libraries like Jsoup for HTML parsing. Add the following dependency to your pom.xml if you're using Maven:

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.16.1</version> <!-- Use the latest version -->
</dependency>

2. Code to Retrieve OG Tags

Here's a sample Java program to fetch and parse OG tags:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class OGTagExtractor {

    public static void main(String[] args) {
        String url = "https://example.com"; // Replace with the desired URL

        try {
            // Fetch the HTML content of the page
            Document document = Jsoup.connect(url).get();

            // Extract meta tags with property="og:*"
            Elements metaTags = document.select("meta[property^=og:]");
            Map<String, String> ogTags = new HashMap<>();

            for (Element tag : metaTags) {
                String property = tag.attr("property");
                String content = tag.attr("content");
                ogTags.put(property, content);
            }

            // Print extracted OG tags
            ogTags.forEach((key, value) -> 
                System.out.println(key + ": " + value)
            );

        } catch (IOException e) {
            System.err.println("Error fetching the URL: " + e.getMessage());
        }
    }
}

3. How It Works

  • Fetch HTML Content: The Jsoup.connect(url).get() fetches the HTML content of the given URL.
  • Select OG Tags: The document.select("meta[property^=og:]") finds all <meta> tags where the property attribute starts with og:.
  • Extract Content: The attr("property") and attr("content") methods extract the property and content attributes of each tag.
  • Store and Print: The tags are stored in a Map and printed.

Example Output

For a page with these OG tags:

<meta property="og:title" content="Example Title">
<meta property="og:description" content="Example Description">
<meta property="og:image" content="https://example.com/image.jpg">

The output will be:

og:title: Example Title
og:description: Example Description
og:image: https://example.com/image.jpg

This approach is straightforward and effective for retrieving OG tags from web pages.

댓글

이 블로그의 인기 게시물

Using the MinIO API via curl

How to split a list into chunks of 100 items in JavaScript, 자바스크립트 리스트 쪼개기

HTML Inline divisions at one row by Tailwind

Boilerplate for typescript server programing

가속도 & 속도

Gradle multi-module project

How to checkout branch of remote git, 깃 리모트 브랜치 체크아웃

CDPEvents in puppeteer

Sparse encoder

Reactjs datetime range picker