zaterdag 31 oktober 2015

Script to find unique values of all columns in a table (data profiling)

Script to find unique values of all columns in a table (data profiling)

Someone in the community was looking for a script to subtract all unique values for columns in a table. There are several ways to do this, for example use a DISTINCT on each column and gather your lists that way.
You could also build your query using dynamic sql and all_tab_columns so you don't have to program the columns.
Next to that there are more advanced ways using Java and Hashing, but a very simple script that may be useful in some cases could be something like this. Ofcourse note that I use varchar as index here, where your columns could be dates, numbers, etc.
If data analysis however is a major thing, you may also like to check out the Big Data Preparation Cloud (https://cloud.oracle.com/bigdatapreparation). Pretty cool stuff which does profiling, but also provides mapping rules and detects data that might need obscuring like SSN, credit card numbers, etc.

DECLARE
  TYPE
    T_Values IS TABLE OF VARCHAR2(240) INDEX BY VARCHAR2(240);
  CURSOR C_All_Rows
  IS
  SELECT City
  ,      Country
  FROM   XXX_TEST
  ;
 
  l_cities   T_Values;
  l_countries T_Values;
  i           VARCHAR2(240);
 
BEGIN
  <<Rows>>
  FOR R IN C_All_Rows
  LOOP
    IF l_cities.exists (R.City) THEN NULL; ELSE l_cities (R.City) := R.City; END IF;
 IF l_countries.exists (R.Country) THEN NULL; ELSE l_countries (R.Country) := R.Country; END IF;
  END LOOP Rows;
 
  dbms_output.put_line ('Number of countries: ' || l_countries.count);
  dbms_output.put_line ('Number of cities   : ' || l_cities.count);
 
 
  i := l_countries.first;
  WHILE i IS NOT NULL LOOP
    dbms_output.put_line (l_countries (i));
    i := l_countries.next(i);
  END LOOP;
 
    i := l_cities.first;
  WHILE i IS NOT NULL LOOP
    dbms_output.put_line (l_cities (i));
    i := l_cities.next(i);
  END LOOP;
 
 
END;

Geen opmerkingen:

Een reactie posten

Opmerking: Alleen leden van deze blog kunnen een reactie posten.