Configuration Parameters
OmniSci has minimal configuration requirements with a number of additional configuration options. This topic describes the required and optional configuration changes you can use in your OmniSci instance.
Data Directory
Before starting the OmniSci server, you must initialize the persistent data
directory. To do so, create an empty directory at the desired path, such as /var/lib/mapd
. Create the environment variable $MAPD_STORAGE
.
export MAPD_STORAGE=/var/lib/mapd
Change the owner of the directory to the user that the server will run as
($MAPD_USER
):
sudo mkdir -p $MAPD_STORAGE
sudo chown -R $MAPD_USER $MAPD_STORAGE
Where $MAPD_USER
is the system user account that the server runs
as, such as mapd
, and $MAPD_STORAGE
is the path to the parent of the OmniSci server data
directory.
Finally, run $MAPD_PATH/bin/initdb
with the data directory path as the argument:
$MAPD_PATH/bin/initdb $MAPD_STORAGE
Configuration File
OmniSci supports storing options in a configuration file. This is useful if, for example, you need to run the OmniSci server and web server on ports different than the defaults.
If you store a copy of mapd.conf
in the $MAPD_STORAGE directory, the configuration settings are picked up automatically by the sudo systemctl start mapd_server
and sudo systemctl start mapd_web_server
commands.
Set the flags in the configuration file using the format <flag> = <value>
. Strings must be enclosed in quotes. The following is a sample configuration file. The entry for data
path is a string and must be in quotes. The last entry in the first section, for null-div-by-zero
, is the Boolean value true
and does not require quotes.
port = 9091 http-port = 9090 data = "/var/lib/mapd/data" null-div-by-zero = true [web] port = 9092 frontend = "/opt/mapd/frontend" servers-json = "/var/lib/mapd/servers.json" enable-https = true
Command Line Parameters
You can make ad hoc changes to your configuration by specifying parameters on the command line at runtime. Append two hyphens to the parameter, followed by any required argument. For example, the following command starts the mapd_server using a temporary configuration file.
sudo systemctl start mapd_server --config ~/temp.conf
Configuration Parameters for OmniSci Server
Flag | Description | Implied Value | Default Value | Why Change It? |
---|---|---|---|---|
allow-cpu-retry [=arg] | Allow the queries which failed on GPU to retry on CPU, even when watchdog is enabled. | TRUE[1] | FALSE [0] | When watchdog is enabled most queries that run on GPU and throw a watchdog exception will fail. Turn this on to allow queries which fail the watchdog on GPU to retry on CPU. The default behavior is for queries that run out of memory on GPU to throw an error if watchdog is enabled (note that watchdog is enabled by default). |
allow-loop-joins [=arg] |
Enable loop joins | TRUE[1] | FALSE [0] | This flag enables the loop join implementation, as opposed to the default
hash join implementation. Queries loop over all rows from all tables
involved in the join, and evaluate the join condition.
Loop joins can be effective when you compare a large inner dataset to a
small outer dataset. When both datasets are large, performance is
predictably slower. In most scenarios, hash-join (default) performance is superior to loop-join performance. There are two cases when you might use loop joins:
|
bigint-count [=arg] | Use 64-bit count | FALSE[0] | FALSE[0] | This setting is disable by default because 64-bit integer atomics are slow on GPUs. If you see negative values for a count, indicating overflow, enable this setting. If your data set has more than 4 billion records you will likely need to turn this on. |
calcite-max-mem arg | Max memory available to calcite JVM | 1024 | Change if Calcite reports out of memory errors. | |
calcite-port arg | Calcite port number | 9093 | Change to avoid collisions with ports already in use. | |
config arg | Path to mapd.conf | $MAPD_STORAGE | Change for testing and debugging. | |
cpu | Run on CPU only | FALSE | One use case for disabling GPUs would be during database conversion. That requires moving a large amount of data with minimal processing. | |
cpu-buffer-mem-bytes arg | Size of memory reserved for CPU buffers [bytes] | 0 | Change to restrict the amount of CPU/system memory OmniSci Core can consume. A default value of 0 indicates no limit on CPU memory use (OmniSci Server uses all available CPU memory on the system). | |
cuda-block-size arg | Size of block to use on GPU | 0 | GPU Performance tuning: Number of threads per block. Note that a default of 0 means use all threads per block. | |
cuda-grid-size arg | Size of grid to use on GPU | 0 | GPU Performance tuning: Number of blocks per device. Note that a default of 0 means use all available blocks per device. | |
data arg | Directory path to OmniSci catalogs | $MAPD_STORAGE | Change for testing and debugging. | |
db-query-list arg | Path to file containing OmniSci queries | N/A | N/A | Use a query list to autoload data to GPU memory on startup to speed performance. |
dynamic-watchdog-time-limit [=arg] | Dynamic watchdog time limit, in milliseconds | 10000 | 100000 | Change if Dynamic Watchdog is stopping queries which are expected to take longer than this limit. |
enable-access-priv-check [=arg] | Check user access privileges to database objects | TRUE[1] | TRUE[1] | Disables the privileges model. Essentially the same as running with only superusers. |
enable-debug-timer [=arg] | Enable fine grained query execution timers for debug. | TRUE[1] | FALSE [0] | For debugging, logs verbose timing information for query execution (time to load data, time to compile code, etc). |
enable-dynamic-watchdog [=arg] | Enable dynamic watchdog | TRUE[1] | FALSE [0] | |
enable-filter-push-down [=arg(=1)] (=0) | Enable filter push down through joins. | TRUE[1] | FALSE[0] | Evaluates filters in the query expression for selectivity and pushes down highly selective filter into the join according to selectivity parameters. See also What is Predicate Pushdown? |
enable-overlaps-hashjoin [=arg(=1)] (=0) | Enable the overlaps hash join framework allowing for range join (e.g. spatial overlaps) computation using a hash table. | TRUE[1] | FALSE[0] | |
enable-watchdog [arg] | Enable watchdog | TRUE[1] | TRUE[1] | |
filter-push-down-low-frac | Higher threshold for selectivity of filters which are pushed down. | filter-push-down-low-frac = 0.1 | Filters with selectivity less than this threshold are considered for a push down. | |
filter-push-down-passing-row-ubound | Upper bound on the number of rows that should pass the filter if the selectivity is less than the high fraction threshold. | filter-push-down-passing-row-ubound = 4000000 | ||
flush-log [arg] | Immediately flush logs to disk. | TRUE[1] | TRUE[1] | Set to FALSE if this is a performance bottleneck. |
from-table-reordering [=arg(=1)] (=1) | Enable automatic table reordering in FROM clause | TRUE[1] | TRUE[1] | Automatic FROM clause table reordering re-orders the sequence of a join to place large tables on the inside of the join clause and smaller tables on the outside. OmniSci also reorders tables between join clauses to prefer hash joins over loop joins. You should not need to change this value except in special cases where OmniSci engineers are working directly with you to resolve an issue. |
gpu | Run on GPUs (Default) | TRUE | One use case for disabling GPUs would be during database conversion. That requires moving a large amount of data with minimal processing. | |
gpu-buffer-mem-bytes arg | Size of memory reserved for GPU buffers [bytes] (per GPU) | 0 | Restricts the amount of memory a single process uses, so that when running multitenancy in the cloud several processes can all use the same GPUs. | |
hll-precision-bits [=arg] | Number of bits used from the hash value used to specify the bucket number. | 11 | 11 | Change to increase/decrease approx_count_distinct() precision. Increased precision decreases performance. |
http-port arg | HTTP port number | 9090 | Change to avoid collisions with ports already in use. | |
idle-session-duration arg | Maximum duration of an idle session, in minutes. | 60 | Change to increase or decrease duration of an idle session before timeout. | |
inner-join-fragment-skipping [=arg(=1)] (=0) | Enable/disable inner join fragment skipping. | Enables skipping fragments for improved performance during inner join operations. | ||
license arg | Path to file containing license key | Change if your provided license file is located in a different location or has a different name. | ||
max-session-duration arg | Maximum duration of the active session, in minutes. | 30 | Change to increase or decrease session duration before timeout. | |
null-div-by-zero [=arg] | Allows processing to complete when when the dataset would cause a div/0 error. | 0 | Set to TRUE if you prefer to return null when dividing by zero, FALSE to throw an exception. | |
num-gpus arg | Number of gpus to use | -1 | In a shared environment, you can assign the number of GPUs to a particular application. The default, -1, means use all available GPUs. | |
num-reader-threads arg | Number of reader threads to use | 0 | Drop the number of reader threads to prevent imports from taking all available CPU power. Default is to use all threads. | |
overlaps-bucket-threshold arg (=0.10000000000000001) | The minimum size of a bucket corresponding to a given inner table range for the overlaps hash join. | 0.10000000000000001 | ||
read-only [=arg] | Enable read-only mode | TRUE[1] | FALSE[0] | Prevents inadvertent (or purposeful) changes to the dataset. |
render-mem-bytes arg | Size of memory reserved for rendering [bytes] | 500000000 | This allocation is performed at startup on each configured GPU. It is static and persists while the server is running unless you execute a \clear_gpu_memory command. Increase if rendering a large number of points/symbols and you have received out of memory exceptions that read "Not enough OpenGL memory to render the query results" during rendering. Default is 500 MB. | |
render-poly-cache-bytes arg | Size of memory reserved for polygon rendering [bytes] | 300000000 | The polygon render cache is used to improve polygon rendering performance from frame-to-frame when rendering the same query. Oftentimes more complex queries are used with polygon rendering, such as choropleths that use expensive joins and aggregates. In addition, the processing time required to build the polygon buffers for rendering can be relatively expensive. To mitigate poor performance with successive rebuilds of query results and polygon render buffers, the polygon cache can be used. This configuration flag limits the maximum size of the polygon render cache. In contrast to 'render-mem-bytes', there is no allocation performed at startup for this configuration flag, so if no polygon rendering is ever performed, no allocations will have been executed that count toward this limit. Polygon buffer allocations are performed dynamically when requested. If the query results and polygon buffer sizes exceed the limit of the cache, the render can still be executed (as long as there's enough GPU memory to do so), but you may see performance degredation from frame-to-frame. If you notice poor performance from frame-to-frame with polygon rendering, you may want to consider increasing this cache size. You can get hints from the INFO log as to what a more appropriate setting should be. For instance, if you see a log message such as this: "Cannot cache | |
rendering [=arg] | Enable/disable backend rendering | TRUE[1] | TRUE[1] | Disable rendering when it is not in use. This frees up the memory set aside for rendering by the render-mem-bytes option. To re-enable rendering, you must restart mapd_server. |
res-gpu-mem =arg | Reserved memory for GPU, not use OmniSci allocator. | 134217728 | OmniSci is very greedy. We take all the memory on the GPU except for (Render-Mem-Bytes + Res-Gpu_Mem). We allocate for all of render-mem-bytes at startup. The res-gpu-mem allows you to reserve some extra memory for your system (for example, if your GPU is also driving your display, like on a laptop or single card desktop). This is also a useful flag if you have other processes sharing the GPU with OmniSci, such as a machine learning pipeline. In advanced rendering scenarios or distributed setups, increasing `res-gpu-mem` allows the system to grab additional memory for the renderer, or for aggregating results for the renderer from multiple leaf nodes. | |
start-gpu arg | First gpu to use | FALSE[0] | ||
trivial-loop-join-threshold [=arg] | The maximum number of rows in the inner table of a loop join considered to be trivially small | 1000 | 1000 | |
Enterprise Edition Additional Parameters | ||||
cluster arg | Path to data leaves list JSON file. Indicates that the OmniSci server instance is an aggregator node, and where to find the rest of its cluster. | $MAPD_STORAGE | Change for testing and debugging. | |
compression-limit-bytes [=arg(=536870912)] (=536870912) | Compress result sets that are transfered between leaves. | 536870912 | 536870912 | Minimum length of payload above which data is compressed. |
compressor arg (=lz4hc) | Compressor algorithm to be used by the server to compress data being transferred between server. | lz4hc | lz4hc | See Data Compression for compression algorithm options. |
ha-brokers arg | Location of the HA brokers. | Point to Kafka broker used for High Availability. | ||
ha-group-id arg | Id of the HA group this server is in. | Change to match the group ID used for all servers in the OmniSci Core High Availability group. | ||
ha-shared-path arg | Directory path to shared OmniSci.directory. | Required part of the High Availability OmniSci Core setup. Specifies the shared file storage that allows multiple OmniSci Core servers to function as a High Availability cluster. | ||
ha-unique-server-id arg | Unique id to identify this server in the HA group. | Change to assign unique ID to this server in the OmniSci High Availability group. | ||
ldap-dn arg | ldap DN Distinguished Name. | (=uid=%s, cn=users, cn=accounts, dc=mapd, dc=com) | ||
ldap-role-query-regex arg | RegEx to use to extract role from role query result. | |||
ldap-role-query-url arg | ldap query role URL. | |||
ldap-superuser-role arg | The role name to identify a superuser. | |||
ldap-uri arg | ldap server uri. | |||
leaf-conn-timeout [=arg] | Leaf connect timeout, in milliseconds. | 20000 | 20000 | Increase/decrease to fail Thrift connections between OmniSci Core instances more or less quickly if a connection cannot be established. |
leaf-recv-timeout [=arg] | Leaf receive timeout, in milliseconds. | 300000 | 300000 | Increase/decrease to fail Thrift connections between OmniSci Core instances more or less quickly if data is not received in the time allotted. |
leaf-send-timeout [=arg] | Leaf send timeout, in milliseconds. | 300000 | 300000 | Increase/decrease to fail Thrift connections between OmniSci Core instances more or less quickly if data is not sent in the time allotted. |
saml-metadata-file arg | Path to Identity provider metadata file. | This is a required flag for running SAML. An Identity provider (like Okta) supplies a metadata file. From this file, OmniSci uses:
| ||
saml-sp-target-url arg | URL of the service provider for which SAML assertions should be generated. | This is a required flag for running SAML. It is used to verify that a SAML token was issued for OmniSci and not for some other service. | ||
saml-sync-roles arg (=0) | Enable mapping of SAML groups to MapD roles. | saml-sync-roles [ = 0] | The SAML Identity provider (for example, Okta) automatically creates users at login and assigns them roles they already have as groups in SAML. | |
string-servers arg | Path to string servers list JSON file. | Indicates that OmniSci Core is running in distributed mode and required to designate a leaf server when running in distributed mode. |
Configuration Parameters for OmniSci Web Server
Flag | Description | Default | Why Change It? |
---|---|---|---|
-b | backend-url string |
URL to http-port on mapd_server | http://localhost:9090 |
Change to avoid collisions with other services. |
cert string |
Certificate file for HTTPS | cert.pem |
Change for testing and debugging. |
-c | config string |
Path to OmniSci configuration file | Change for testing and debugging. | |
-d | data string |
Path to OmniSci data directory | data |
Change for testing and debugging. |
db-query-list <path-to-query-list-file> |
Pre-load data to memory based on SQL queries stored in a list file. | n/a | Automatically run queries that load the most frequently used data to enhance performance. See Pre-loading Data. |
docs string |
Path to documentation directory | docs |
Change if you move your documentation files to another directory. |
enable-https |
Enable HTTPS support | Change to enable secure HTTP. | |
-f | frontend string |
Path to frontend directory | frontend |
Change if you move the location of your frontend UI files. |
key string |
Key file for HTTPS | key.pem |
Change for testing and debugging. |
-p | port int |
Frontend server port | 9092 |
Change to avoid collisions with other services. |
-r | read-only |
Enable read-only mode | Prevent inadvertent (or nefarious) changes to the data. | |
servers-json string |
Path to servers.json | Change for testing and debugging. | |
timeout duration |
Maximum request duration in #h#m#s format. For example 0h30m0s represents a duration of 30 minutes. |
1h0m0s |
The --timeout option controls the maximum duration of individual HTTP requests. This is used to manage resource exhaustion caused by improperly closed connections. One side effect is that it limits the execution time of queries made over the Thrift HTTP transport. This timeout duration must be increased if queries are expected to take longer than the default duration of one hour; for example, if you perform a COPY FROM on a large file when using mapdql with the HTTP transport. |
tmpdir string |
Path for temporary file storage | /tmp |
The temporary directory is used as a staging location for file uploads. You might want to place this directory on the same file system as the OmniSci data directory. If not specified on the command line, mapd_web_server also respects the standard TMPDIR environment variable as well as a specific MAPD_TMPDIR environment variable, the latter of which takes precedence. If you use neither the command-line argument nor one of the environment variables, the default, /tmp/ is used. |
-v | verbose |
Print all log messages to stdout | Change for testing and debugging. | |
version |
Return version |