Cross Join
I remember learning about cross joins (aka Cartesian products) in 1990/1 during my first database course. We were using some research project from Massey University which had produced a RDBMS which ran in DOS and supported SQL1.
My question at the time was – what would be the use of a cross join in a practical system? In my time I’ve occasionally run a cross join by accident (forgetting a where clause or screwing up the join condition) – sometimes even on production systems with very large databases – whoops!
Today – I finally got to use one to solve a business problem! Here’s the scenario (bad are words censored):
bad_words comments
+---------------+ +-------+------------+------+ | word | | id | comments | live | +---------------+ +-------+------------+------+ | W*NK*RS | | 1 | a comm... | 1 | | B*ST*RDS | | 2 | another... | 1 | | ARS* | | 36000 | lots... | 1 | | M*TH*RF*CK*RS | +-------+------------+------+ | C*NTS | 36,000 rows | C*CKS | | etc. | +---------------+ 648 rows
Two tables – one with a list of words which should not appear in the other table. The table to be scanned actually has three columns which need to be checked.
What I did was cross join both tables (resulting in 23,328,000 rows) and then use the where clause to filter the out the rows which don’t contain at least one bad word.
INSERT INTO bad
SELECT DISTINCT c.id
FROM comments c, bad_words w
WHERE
pros LIKE CONCAT('%',w.word,'%')
OR cons LIKE CONCAT('%',w.word,'%')
OR comments LIKE CONCAT('%',w.word,'%')
Note: It actually got more complicated than this – we needed to match on word boundaries – which meant using a regex – but the principle was the same.
- This was impressive on a number of levels – state of the art for DOS was dbaseIII which didn’t support SQL but had some twisted language all it’s own. [↩]
No Comments Yet