In my Getting started with Oracle BigData Blog I shared how you can obtain an example Avro file, if you don’t have access to a Avro file you can visit my previous Blog to get started.
Bigdata Tools
The Apache Avro file stores data in a row-based binary format and therefore not human readable, fortunately we can use avro-tools to inspect the data and schema information.
On my Mac I have installed both parquet-tools and avro-tools using Brew e.g.
brew install parquet-tools brew install avro-tools
avro-tools help
$ avro-tools -h
Version 1.9.1
of Apache Avro
Copyright 2010-2015 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (https://www.apache.org/).
----------------
Available tools:
canonical Converts an Avro Schema to its canonical form
cat Extracts samples from files
compile Generates Java code for the given schema.
concat Concatenates avro files without re-compressing.
fingerprint Returns the fingerprint for the schemas.
fragtojson Renders a binary-encoded Avro datum as JSON.
fromjson Reads JSON records and writes an Avro data file.
fromtext Imports a text file into an avro data file.
getmeta Prints out the metadata of an Avro data file.
getschema Prints out schema of an Avro data file.
idl Generates a JSON schema from an Avro IDL file
idl2schemata Extract JSON schemata of the types from an Avro IDL file
induce Induce schema/protocol from Java class/interface via reflection.
jsontofrag Renders a JSON-encoded Avro datum as binary.
random Creates a file with randomly generated instances of a schema.
recodec Alters the codec of a data file.
repair Recovers data from a corrupt Avro Data file
rpcprotocol Output the protocol of a RPC service
rpcreceive Opens an RPC Server and listens for one message.
rpcsend Sends a single RPC message.
tether Run a tethered mapreduce job.
tojson Dumps an Avro data file as JSON, record per line or pretty.
totext Converts an Avro data file to a text file.
totrevni Converts an Avro data file to a Trevni file.
trevni_meta Dumps a Trevni file's metadata as JSON.
trevni_random Create a Trevni file filled with random instances of a schema.
trevni_tojson Dumps a Trevni file as JSON.
avro-tools getschema
The getschema option provides a pretty JSON formatted output of the schema.
rekins$ avro-tools getschema movie.avro { "type" : "record", "name" : "Movie", "namespace" : "oracle.avro", "fields" : [ { "name" : "movie_id", "type" : "int", "default" : 0 }, { "name" : "title", "type" : "string", "default" : "" }, { "name" : "year", "type" : "int", "default" : 0 }, { "name" : "budget", "type" : "int", "default" : 0 }, { "name" : "gross", "type" : "double", "default" : 0 }, { "name" : "plot_summary", "type" : "string", "default" : "" } ] }
avro-tools getmeta
The getmeta option can also be used to provided a formatted output of the schema.
rekins$ avro-tools getmeta movie.avro avro.schema {"type":"record","name":"Movie","namespace":"oracle.avro","fields":[{"name":"movie_id","type":"int","default":0},{"name":"title","type":"string","default":""},{"name":"year","type":"int","default":0},{"name":"budget","type":"int","default":0},{"name":"gross","type":"double","default":0},{"name":"plot_summary","type":"string","default":""}]} avro.codec null
avro-tools tojson
We can use the tojson –pretty option to format output on each line, and the –head option to return the first n rows.
rekins$ avro-tools tojson --pretty --head 1 movie.avro { "movie_id" : 1054798, "title" : "Crank: High Voltage", "year" : 2009, "budget" : 20000000, "gross" : 3.4560577E7, "plot_summary" : "Crank: High Voltage (promoted as Crank 2: High Voltage in some regions and on DVD) is a 2009 American action film and sequel to the 2006 action film, Crank. The story of the film resumes shortly after the first film left off, retaining its real-time presentation and adding more special effects. Crank: High Voltage was written and directed by Mark Neveldine and Brian Taylor, who both wrote and directed the previous film. The film was released in the United Kingdom on April 15, 2009, two days prior to its North American release date." }
Oracle Database 19.3
I have used the same local directory for both the Avro and Parquet files as per my previous Blog. Doing this on-premises requires a little bit more effort than in the Oracle Cloud as can’t use the dbms_cloud.create_external_table procedure.
Please note currently the DBMS_CLOUD package is not available to on-premise Enterprise Edition releases.
Local Avro file
$ ls -lh /home/oracle/bigdata/movie.avro -rw-r--r--. 1 oracle oinstall 324K Jan 3 16:20 /home/oracle/bigdata/movie.avro
Create Oracle Directory
CREATE OR REPLACE DIRECTORY BIGDATA_DIR AS '/home/oracle/bigdata'
Create Oracle External Table
Below I have used the details captured using avro-tools to create an external table.
CREATE TABLE MOVIE_LOCAL ( MOVIE_ID NUMBER(10), TITLE VARCHAR2(4000), YEAR NUMBER(10), BUDGET NUMBER(10), GROSS BINARY_DOUBLE, PLOT_SUMMARY VARCHAR2(4000) ) ORGANIZATION EXTERNAL (TYPE ORACLE_BIGDATA DEFAULT DIRECTORY BIGDATA_DIR ACCESS PARAMETERS ( com.oracle.bigdata.debug=TRUE com.oracle.bigdata.log.opt=normal com.oracle.bigdata.fileformat=avro ) location (BIGDATA_DIR:'movie.avro') ) REJECT LIMIT UNLIMITED /
Accessing via SQL*Plus
As we have now created an Oracle Directory, External Table, and have a local avro file we can try accessing it from within Oracle SQL*Plus


set feed off set pages 100 set linesize 150 column MOVIE_ID heading 'Movie Id' column TITLE heading 'Title' format a20 COLUMN BUDGET FORMAT 999,999,999 Heading 'Budget' COLUMN GROSS FORMAT 999,999,999 Heading 'Gross' column PLOT_SUMMARY heading 'Plot Summary' format a60 select MOVIE_ID, TITLE, YEAR, BUDGET, GROSS, PLOT_SUMMARY from movie_local where rownum=1;

My next step is now to repeat my on-premise tests using a Free Oracle Cloud account and the ‘DBMS_CLOUD’ package to compare experience.
[twitter-follow screen_name=’RonEkins’ show_count=’yes’]