Importing Geospatial Data
You can import geospatial data using OmniSci Immerse or OmniSci mapdql in a variety of formats.
- Importing Geospatial Data Using OmniSci Immerse
- Importing Well-known Text
- Importing Legacy CSV/TSV Files
- Importing Geospatial Shape Files
- WGS84 Coordinate Compression
Note | If there is a potential for duplicate entries and you prefer to avoid loading duplicate rows, see How can I avoid creating duplicate rows? on the Troubleshooting page. |
Importing Geospatial Data Using OmniSci Immerse
You can use OmniSci Immerse to import geospatial data into the OmniSci Core Database.
Supported formats include:
- Keyhole Markup Language (
.kml
) - GeoJSON (
.geojson
) - Shapefiles (
.shp
)
.shp
, .shx
, .dbf
, and .prj
. If you do not import the .prj
file, the coordinate system will be incorrect and you cannot render the shapes on a map.
To import geospatial definition data:
-
Open OmniSci Immerse.
-
Click Data Manager.
-
Click Import Data.
-
Click the large
+
icon to select files for upload, or drag and drop the files to the Data Importer screen.When importing shapefiles, upload all required file types at the same time. If you upload them separately, Immerse issues an error message.
-
Wait for the uploads to complete (indicated by green checkmarks on the file icons), then click Import Files.
-
On the Table Preview screen, edit column headers (if needed), enter a name for the table in the field at the bottom of the screen, and click Save Table.
-
On the Table Metadata screen, verify the rows and columns that compose your data table.
Importing Well-known Text
You can import spatial representations in Well-known Text (WKT) format. WKT is a text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects and transformations between spatial reference systems.
Note | When representing longitude and latitude in OmniSci geospatial primitives, the first coordinate is assumed to be longitude by default. |
Creating WKT tables using mapdql
You can use mapdql
to define tables with columns that store WKT geospatial objects.
mapdql> \d geo CREATE TABLE geo ( p POINT, l LINESTRING, poly POLYGON)
Insert
You can use mapdql
to insert data as WKT string values.
mapdql> INSERT INTO geo values('POINT(20 20)', 'LINESTRING(40 0, 40 40)', 'POLYGON(( 0 0, 40 0, 40 40, 0 40, 0 0 ))');
Importing Delimited Files
You can insert data from CSV/TSV files containing WKT strings.
> cat geo.csv "p", "l", "poly" "POINT(1 1)", "LINESTRING( 2 0, 2 2)", "POLYGON(( 1 0, 0 1, 1 1 ))" "POINT(2 2)", "LINESTRING( 4 0, 4 4)", "POLYGON(( 2 0, 0 2, 2 2 ))" "POINT(3 3)", "LINESTRING( 6 0, 6 6)", "POLYGON(( 3 0, 0 3, 3 3 ))" "POINT(4 4)", "LINESTRING( 8 0, 8 8)", "POLYGON(( 4 0, 0 4, 4 4 ))" mapdql> COPY geo FROM 'geo.csv'; Result Loaded: 4 recs, Rejected: 0 recs in 0.356000 secs
You can use your own custom delimiter in your data files.
> cat geo1.csv "p", "l", "poly" POINT(5 5); LINESTRING(10 0, 10 10); POLYGON(( 5 0, 0 5, 5 5 )) mapdql> COPY geo FROM 'geo1.csv' WITH (delimiter=';', quoted=’false’); Result Loaded: 1 recs, Rejected: 0 recs in 0.148000 secs
Importing Legacy CSV/TSV Files
You can import CSV/TSV files intended for tables that store longitude and latitude as either:
- Separate consecutive scalar columns
- A POINT field.
If the data is stored as a POINT field, ST_DISTANCE and ST_CONTAINS functions can be used in spatial joins and in calculations.
Note | When lon and lat are loaded into a POINT column, they are displayed as such when querying the table:
mapdql> select * from destination_points; name|pt Just Fishing Around|POINT (-85.499999999727588 44.6929999755849 Moonlight Cove Waterfront|POINT (-85.5046011346879 44.6758447935227) |
You must first create the table and then use the COPY
command to populate it from the CSV/TSV file.
> cat legacy_geo.csv "lon", "lat" -71.160281, 42.258729 -71.160837, 42.259113 mapdql> \d new_geo CREATE TABLE new_geo ( p POINT) mapdql> COPY new_geo FROM 'legacy_geo.csv'; Result Loaded: 2 recs, Rejected: 0 recs in 0.152000 secs
The default behavior is to consume the two columns as longitude (x) and then latitude (y). If the order of the coordinates in the CSV file is the opposite, you can load the data using the WITH option lonlat='false'
:
mapdql> COPY new_geo FROM 'legacy_geo.csv' with (lonlat='false');
You can have columns on either side of the point field—the lon/lat in the source file does not have to be at the beginning or end of the target table. You can have fields on either side of the lon/lat pair.
As long as the source CSV/TSV file has lon/lat adjacent to each other, you can import the file into a geometry table and OmniSci automatically maps the lon/lat to a point field. You can change the order if the source data is in lat/lon order.
Importing Geospatial Shape Files
You can create Geo tables by importing specific Geo file formats. OmniSci supports the following types:
- ESRI Shapefile (
.shp
and associated files) - GeoJSON (
.geojson
or.json
) - KML (
.kml
or.kmz
)
The geo file import process automatically creates the table by detecting the column names and types explicitly described in the geo file header. It then creates a single geo column (always called omnisci_geo) which is of one of the supported types (POINT
, LINESTRING
, POLYGON
or MULTIPOLYGON
).
Note | Due to the prevalence of mixed POLYGON/MULTIPOLYGON geo files (and CSVs), if OmniSci detects a POLYGON type geo file, OmniSci creates a MULTIPOLYGON column and imports the data as single polygons. |
You import geo files using the COPY FROM
commmand using the ‘geo’ option.
mapdql> COPY states FROM 'states.shp' with (geo=’true’); mapdql> COPY zipcodes FROM 'zipcodes.geojson' with (geo=’true’); mapdql> COPY cell_towers FROM 'cell_towers.kml' with (geo=’true’);
If the table does not already exist, it is created automatically.
If the table already exists, and the data in the geo file has exactly the same column structure, the new file is appended to the existing table. This enables the import of large geo data sets that are split across multiple files. The new file is rejected if it does not have the same column structure.
Note | When appending geo files containing POLYGON or MULTIPOLYGON data, the spatial distribution of the polygons in the existing rows must be evaluated before the new data is imported. This might take some time with large tables and can cause a geometrical slowdown if a large number of files are appended. To mitigate this slowdown, combine data into the minimum possible number of files. |
By default, geo data is stored as GEOMETRY
, converted to Earth coordinates in lon/lat (WGS84 SRID 4326), compressing the data size by 50% (giving a worst-case real-world resolution of ~4 inches at the equator).
To store in other formats, use the following WITH options in addition to geo=’true’:
geo_coords_encoding=’NONE|COMPRESSED(32)’ // default is COMPRESSED(32)’ geo_coords_srid=4326|3857|900913 // default is 4326
Note | Rendering of geo LINESTRING , POLYGON and MULTIPOLYGON might only be possible with data stored in the default lon/lat (WGS84 SRID 4326) format (although the type and encoding are flexible). |
Importing Geo Files from Archives or Non-Local Storage
You can import geo files directly from archive files (for example, .zip .tar .tgz .tar.gz) without unpacking the archive. You can directly import individual geo files compressed with Zip or GZip (GeoJSON and KML only). The server opens the archive header and loads the first candidate file it finds (.shp .geojson .json .kml), along with any associated files (in the case of an ESRI Shapefile, the associated files must be siblings of the first).
$ unzip -l states.zip Archive: states.zip Length Date Time Name --------- ---------- ----- ---- 0 2018-02-13 11:09 states/ 446116 2017-11-06 12:15 states/cb_2014_us_state_20m.shp 8434 2017-11-06 12:15 states/cb_2014_us_state_20m.dbf 9 2017-11-06 12:15 states/cb_2014_us_state_20m.cpg 165 2017-11-06 12:15 states/cb_2014_us_state_20m.prj 516 2017-11-06 12:15 states/cb_2014_us_state_20m.shx --------- ------- 491525 6 files mapdql> COPY states FROM 'states.zip' with (geo=’true’); mapdql> COPY zipcodes FROM 'zipcodes.geojson.gz' with (geo=’true’); mapdql> COPY zipcodes FROM 'zipcodes.geojson.zip' with (geo=’true’); mapdql> COPY cell_towers FROM 'cell_towers.kml.gz' with (geo=’true’);
You can import geo files or archives directly from an Amazon S3 bucket.
mapdql> COPY states FROM 's3://mybucket/myfolder/states.shp' with (geo=’true’); mapdql> COPY states FROM 's3://mybucket/myfolder/states.zip' with (geo=’true’); mapdql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.gz' with (geo=’true’); mapdql> COPY zipcodes FROM 's3://mybucket/myfolder/zipcodes.geojson.zip' with (geo=’true’);
You can provide Amazon S3 credentials, if required, by setting variables in the environment of the mapdql
process…
AWS_REGION=us-west-1 AWS_ACCESS_KEY_ID=******************** AWS_SECRET_ACCESS_KEY=****************************************
You can also provide your credentials explicitly in the COPY FROM command.
mapdql> COPY states FROM 's3://mybucket/myfolder/states.zip' WITH (geo=’true’, s3_region=’us-west-1’, s3_access_key=’********************’, s3_secret_key=’****************************************’);
You can import geo files or archives directly from an HTTP/HTTPS website.
mapdql> COPY states FROM 'http://www.mysite.com/myfolder/states.zip' with (geo=’true’);
WGS84 Coordinate Compression
You can extend a column type specification to include spatial reference (SRID) and compression mode information.
Geospatial objects declared with SRID 4326 are compressed 50% by default with ENCODING COMPRESSED(32). In the following definition of table geo2, the columns poly2 and mpoly2 are compressed.
CREATE TABLE geo2 ( p2 GEOMETRY(POINT, 4326) ENCODING NONE, l2 GEOMETRY(LINESTRING, 900913), poly2 GEOMETRY(POLYGON, 4326), mpoly2 GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32));
COMPRESSED(32)
compression provides a smaller memory footprint and faster query execution at the cost of lower resolution (about 4 inches at the equator).
You can disable compression by explicitly choosing ENCODING NONE
.