Retrieving Open Graph (OG) tags from an HTML document in Java

Retrieving Open Graph (OG) tags from an HTML document in Java typically involves fetching the HTML content of a web page and then parsing it to extract the <meta> tags with the property attribute starting with og:. Here's how you can do it step by step:

1. Add Required Dependencies

You can use libraries like Jsoup for HTML parsing. Add the following dependency to your pom.xml if you're using Maven:

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.16.1</version> <!-- Use the latest version -->
</dependency>

2. Code to Retrieve OG Tags

Here's a sample Java program to fetch and parse OG tags:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class OGTagExtractor {

    public static void main(String[] args) {
        String url = "https://example.com"; // Replace with the desired URL

        try {
            // Fetch the HTML content of the page
            Document document = Jsoup.connect(url).get();

            // Extract meta tags with property="og:*"
            Elements metaTags = document.select("meta[property^=og:]");
            Map<String, String> ogTags = new HashMap<>();

            for (Element tag : metaTags) {
                String property = tag.attr("property");
                String content = tag.attr("content");
                ogTags.put(property, content);
            }

            // Print extracted OG tags
            ogTags.forEach((key, value) -> 
                System.out.println(key + ": " + value)
            );

        } catch (IOException e) {
            System.err.println("Error fetching the URL: " + e.getMessage());
        }
    }
}

3. How It Works

  • Fetch HTML Content: The Jsoup.connect(url).get() fetches the HTML content of the given URL.
  • Select OG Tags: The document.select("meta[property^=og:]") finds all <meta> tags where the property attribute starts with og:.
  • Extract Content: The attr("property") and attr("content") methods extract the property and content attributes of each tag.
  • Store and Print: The tags are stored in a Map and printed.

Example Output

For a page with these OG tags:

<meta property="og:title" content="Example Title">
<meta property="og:description" content="Example Description">
<meta property="og:image" content="https://example.com/image.jpg">

The output will be:

og:title: Example Title
og:description: Example Description
og:image: https://example.com/image.jpg

This approach is straightforward and effective for retrieving OG tags from web pages.

댓글

이 블로그의 인기 게시물

Fundamentals of English Grammar #1

Using the MinIO API via curl

Create topic on Kafka with partition count, 카프카 토픽 생성하기

In HBase, the "memory to disk" flush operation

Install and run an FTP server using Docker

To switch to a specific tag in a Git repository

Vespa vs Milvus

Scan an HBase table with a prefix filter

To download a file from MinIO using Spring Boot, 스프링부트 Minio 사용하기

kafka polling vs listen