Script to find unique values of all columns in a table (data profiling)
Someone in the community was looking for a script to subtract all unique values for columns in a table. There are several ways to do this, for example use a DISTINCT on each column and gather your lists that way.
You could also build your query using dynamic sql and all_tab_columns so you don't have to program the columns.
Next to that there are more advanced ways using Java and Hashing, but a very simple script that may be useful in some cases could be something like this. Ofcourse note that I use varchar as index here, where your columns could be dates, numbers, etc.
If data analysis however is a major thing, you may also like to check out the Big Data Preparation Cloud (https://cloud.oracle.com/bigdatapreparation). Pretty cool stuff which does profiling, but also provides mapping rules and detects data that might need obscuring like SSN, credit card numbers, etc.
DECLARE
TYPE
T_Values IS TABLE OF VARCHAR2(240) INDEX BY VARCHAR2(240);
CURSOR C_All_Rows
IS
SELECT City
, Country
FROM XXX_TEST
;
l_cities T_Values;
l_countries T_Values;
i VARCHAR2(240);
BEGIN
<<Rows>>
FOR R IN C_All_Rows
LOOP
IF l_cities.exists (R.City) THEN NULL; ELSE l_cities (R.City) := R.City; END IF;
IF l_countries.exists (R.Country) THEN NULL; ELSE l_countries (R.Country) := R.Country; END IF;
END LOOP Rows;
dbms_output.put_line ('Number of countries: ' || l_countries.count);
dbms_output.put_line ('Number of cities : ' || l_cities.count);
i := l_countries.first;
WHILE i IS NOT NULL LOOP
dbms_output.put_line (l_countries (i));
i := l_countries.next(i);
END LOOP;
i := l_cities.first;
WHILE i IS NOT NULL LOOP
dbms_output.put_line (l_cities (i));
i := l_cities.next(i);
END LOOP;
END;
Geen opmerkingen:
Een reactie posten
Opmerking: Alleen leden van deze blog kunnen een reactie posten.