Posts tonen met het label Data Profiling. Alle posts tonen
Posts tonen met het label Data Profiling. Alle posts tonen

zaterdag 31 oktober 2015

Script to find unique values of all columns in a table (data profiling)

Script to find unique values of all columns in a table (data profiling)

Someone in the community was looking for a script to subtract all unique values for columns in a table. There are several ways to do this, for example use a DISTINCT on each column and gather your lists that way.
You could also build your query using dynamic sql and all_tab_columns so you don't have to program the columns.
Next to that there are more advanced ways using Java and Hashing, but a very simple script that may be useful in some cases could be something like this. Ofcourse note that I use varchar as index here, where your columns could be dates, numbers, etc.
If data analysis however is a major thing, you may also like to check out the Big Data Preparation Cloud (https://cloud.oracle.com/bigdatapreparation). Pretty cool stuff which does profiling, but also provides mapping rules and detects data that might need obscuring like SSN, credit card numbers, etc.

DECLARE
  TYPE
    T_Values IS TABLE OF VARCHAR2(240) INDEX BY VARCHAR2(240);
  CURSOR C_All_Rows
  IS
  SELECT City
  ,      Country
  FROM   XXX_TEST
  ;
 
  l_cities   T_Values;
  l_countries T_Values;
  i           VARCHAR2(240);
 
BEGIN
  <<Rows>>
  FOR R IN C_All_Rows
  LOOP
    IF l_cities.exists (R.City) THEN NULL; ELSE l_cities (R.City) := R.City; END IF;
 IF l_countries.exists (R.Country) THEN NULL; ELSE l_countries (R.Country) := R.Country; END IF;
  END LOOP Rows;
 
  dbms_output.put_line ('Number of countries: ' || l_countries.count);
  dbms_output.put_line ('Number of cities   : ' || l_cities.count);
 
 
  i := l_countries.first;
  WHILE i IS NOT NULL LOOP
    dbms_output.put_line (l_countries (i));
    i := l_countries.next(i);
  END LOOP;
 
    i := l_cities.first;
  WHILE i IS NOT NULL LOOP
    dbms_output.put_line (l_cities (i));
    i := l_cities.next(i);
  END LOOP;
 
 
END;