YugabyteDB Dialect for Apache Spark

Overview

In Apache Spark, database dialects determine how Spark interacts with a database using JDBC. For PostgreSQL, URL starting with jdbc:postgresql:, Spark selects the PostgresDialect, which includes support for PostgreSQL-specific data types like ArrayType. This is achieved by implementing appropriate type mappings in functions such as getJDBCType().

However, when using the YugabyteDB JDBC driver with a URL starting with jdbc:yugabytedb:, Spark fails to match the URL with any known dialect and defaults to the NoopDialect. The NoopDialect lacks PostgreSQL-compatible features, including handling ArrayType. This mismatch causes processing errors when working with YugabyteDB in Spark.

The YugabyteDBDialectPlugin resolves this issue by:

Providing a specific dialect for the YugabyteDB URL pattern.
Ensuring PostgreSQL-compatible features, including handling of ArrayType, are available when working with YugabyteDB.

By using this dialect, you enable seamless integration of YugabyteDB with Apache Spark, ensuring accurate type mappings and efficient processing.

Steps to Run the Application

Prerequisites

Apache Spark: Ensure Spark 2.4.2 or later is installed and properly configured.
JDK: Install JDK 8 or JDK 11.
Maven: Ensure Maven is installed for building the application.

Build the Jar locally

1. Clone the Repository

git clone https://github.com/yugabyte/spark-yugabytedb-dialect-example.git
cd spark-yugabytedb-dialect-example

2. Build the Jar

mvn clean package

This will generate a JAR file in the target directory

mvn install

Include the dependency in your application's pom.xml

<dependency>
    <groupId>com.yugabyte</groupId>
    <artifactId>spark-yugabytedb-dialect</artifactId>
    <version>3.5.4-yb-1</version>
</dependency>

3. Publish the jar on mvn central

mvn deploy -Dgpg.keyname=thekeyid

4. Run the Test

Create ysql_spark Schema on your cluster

create schema ysql_spark;

Run the test:

mvn exec:java -Dexec.mainClass="org.example.SparkYSQLExample" -Dexec.classpathScope="test"

Verify Output:

The application will insert data into the ysql_spark.student table and retrieve the following data:

+---+------------------+
| ID|           details|
+---+------------------+
|  2|[Mark, 23, Python]|
|  1|  [John, 35, Java]|
+---+------------------+

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YugabyteDB Dialect for Apache Spark

Overview

Steps to Run the Application

Prerequisites

Build the Jar locally

1. Clone the Repository

2. Build the Jar

3. Publish the jar on mvn central

4. Run the Test

About

Releases

Packages

Languages

License

yugabyte/spark-yugabytedb-dialect

Folders and files

Latest commit

History

Repository files navigation

YugabyteDB Dialect for Apache Spark

Overview

Steps to Run the Application

Prerequisites

Build the Jar locally

1. Clone the Repository

2. Build the Jar

3. Publish the jar on mvn central

4. Run the Test

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages