Penn’s Data Warehouse is moving to Snowflake, the Cloud data platform. This migration from our long-standing Oracle environment enables us to leverage modern technology and integration opportunities and enhanced analytical and security capabilities, while continuing to provide authorized access to our critical institutional data assets.
What this means for the Data Warehouse client community:
- Data structures are moving to Snowflake as is, there are no data model changes, and names of tables and views are unchanged.
- All historical data from our on-premises database will be available on Snowflake.
- Queries can generally run as is with no changes, with some minor changes in syntax in some cases (e.g. use of system date).
- End-user security will be migrated as is to Snowflake.
The Data Warehouse becomes fully available on Snowflake early in calendar year 2025. The tabs below provide resources for getting started with Snowflake, details about connecting to the Data Warehouse there, and information about syntax and technical differences from Oracle.
If you have questions about the Warehouse on Snowflake please contact Enterprise Information & Analytics.
The move to the Cloud introduces some additional security safeguards for access to the Data Warehouse:
- Multi-factor authentication (MFA): Individuals accessing the Data Warehouse on Snowflake for the first time must enroll in multi-factor authentication.
- IP whitelisting: In addition, anyone accessing the Data Warehouse from off-campus must register their IP address to access Snowflake. Access from PennNet addresses on campus is automatically allowed.
If you'd like to learn more about the Snowflake platform, the vendor provides excellent, publically available resources:
- Snowflake’s online documentation: https://docs.snowflake.com/
- Snowflake's Data Warehousing Workshop (https://learn.snowflake.com/en/courses/uni-essdww101/) steps you through exercises in Snowflake’s training environment. Note that this requires registering with Snowflake for a free, temporary trial account in order to take the course.
Authorized users may connect to the Data Warehouse on Snowflake through a variety of means: BusinessObjects, JDBC or ODBC connections from other tools, or direct access through Snowsight, Snowflake's web interface. Regardless of the software used to connect, all end-users must enroll in Snowflake's multi-factor authentication (MFA) and connect from an authorized IP address. Schools and centers with automated processes or applications should contact Enterprise Information & Analytics for instructions to request a service account.
Options for connecting to Snowflake:
- BusinessObjects: BusinessObjects universe connections will be redirected to the Data Warehouse on Snowflake. Only universes in the BusinessObjects UNX format can connect to Snowflake, so end-user reports must be migrated to these universes.
- Other tools connection via JDBC or ODBC: Snowflake's provides documentation description confiiguration details for JDBC or ODBC connections. ODBC connection configuration parameters are detailed in this How-To guide.
- Snowsight: Authorized users can log in directly to Snowflake and write their own Data Warehouse queries using the Snowsight web interface. The Snowflake Account name for the Data Warehouse is UPENN1.
Syntax, Format, and Function Differences to Keep in Mind
For by far the majority of queries, SQL from Oracle can be used with no changes in Snowflake. There are some nuances regarding functions, formats, and so on differ from Oracle and are listed below. For more on Snowflake syntax, see their extensive documentation at https://docs.snowflake.com/.
- Null handling:
- Concat or || will return a null if one of the strings is null. To alleviate this, the syntax should be concat(IFNULL(string1, ''), IFNULL(string2,''))
- Concat only works with strings, so if one of the columns is numeric, use to_varchar within the concat. Example - concat(IFNULL(to_varchar(period_number),'' ))
- Working with dates:
- Be very careful when working with SYSDATE in Snowflake. SYSDATE is Oracle syntax to get the current date/time; Snowflake allows SYSDATE() only with function brackets, but it returns the timestamp in UTC only (not in our local timezome). Snowflake has several variations, e.g.: Current_Date(), Current_Timestamp(), etc. Try them to pick the appropriate value for the output you need and see https://docs.snowflake.com/en/user-guide/date-time-examples for further information.
- Example: Oracle query that didn't work in Snowflake: SELECT to_date( concat (to_char ( trunc(sysdate), 'MM/DD/YYYY'), ' 23:59:59'), 'MM/DD/YYYY hh24:mi:ss') from dual;
- Replaced with SNOWFLAKE version: SELECT to_timestamp(concat(to_char(current_timestamp(), 'YYYY-MM-DD'),' 23:59:59'), 'YYYY-MM-DD hh24:mi:ss') ;
- Be very careful when working with SYSDATE in Snowflake. SYSDATE is Oracle syntax to get the current date/time; Snowflake allows SYSDATE() only with function brackets, but it returns the timestamp in UTC only (not in our local timezome). Snowflake has several variations, e.g.: Current_Date(), Current_Timestamp(), etc. Try them to pick the appropriate value for the output you need and see https://docs.snowflake.com/en/user-guide/date-time-examples for further information.
- INSTR is not supported in Snowflake; use POSITION or potentially REGEXP_INSTR()
- Working with sequences: To advance a sequence in Snowflake - SELECT schema.sequencename_SEQ.nextval; - drop the Oracle "from dual"
- LISTAGG function in Snowflake - sorting output list requires the 'within group (ORDER_BY xxx)' statement in the function; using Order by in the query will not work. (See https://docs.snowflake.com/en/sql-reference/functions/listagg)
- VALIDATE_CONVERSION function in Oracle checks to see if the data can be converted to datatype and returns null if it cannot instead of throwing an error. In Snowflake the equivalent is "A special version of TO_DECIMAL , TO_NUMBER , TO_NUMERIC that performs the same operation (i.e. converts an input expression to a fixed-point number), but with error-handling support (i.e. if the conversion cannot be performed, it returns a NULL value instead of raising an error)."
- NUMBER columns in Snowflake tables should be formatted with precision and scale. By default, precision is 38 and scale is 0 (i.e. NUMBER(38, 0) ). Where Oracle would hold decimals even if scale wasn’t defined, Snowflake needs the full format specified or data may be truncated.
- The function to check if a column is numeric try_to_number does not work as expected. As an alternative, use REGEXP_LIKE(xyz , '^[+-]?([0-9]*[.])?[0-9]+$')
- FLOAT/DOUBLE datatypes - Oracle stores these types as binary but displays it as a decimal. Use UTL_RAW functions to see the binary representation (hex) of the number in Oracle. https://docs.oracle.com/database/timesten-18.1/TTPLP/u_raw.htm#TTPLP072 To see the actual float value - use the IEEE floating point converter https://www.h-schmidt.net/FloatConverter/IEEE754.html. The result is that if you move the data as floating point Oracle to floating point in Snowflake, Oracle might display 16.3 whereas Snowflake will display 16.299999237. The stored hex value in Oracle is actually 41826666.
- Standard Deviation: The function to calculate standard deviation (STDDEV) in Oracle and Snowflake does not always return the same value, specifically pertaining to NULL and ZERO. Per Snowflake documentation, "For single-record inputs, STDDEV and STDDEV_SAMP both return NULL. This is different from the Oracle behavior, where STDDEV_SAMP returns NULL for a single record and STDDEV returns 0."