June 2008
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
Search

 
Catagories
Archives
Recent Entries
Links
RSS
deleting duplicates (more)
Catagory: sql server · This Entry · Comment(0) · eMail entry · Google
November 19, 2004 11:39 AM

sql server

here's a version uses a self join, on a derivative table instead of a view... this example uses "mytableid" to distinguish records (by adding a index there) - but you might have used "mydatefield" in a similar manner (assuming the date field differentiates the records, and you adjust the comparisons around accordingly).



step [1], finding duplicates
select count(1) as [dups], field1, field2, field3 
  from mytable a with (nolock)
  group by field1, field2, field3
  having count(1) > 1
  order by count(1) DESC
  compute sum(count(1))


step [2], review data
--select version...

select x.field1, x.field2, x.field3, x.mydatefield, x.mytableid
from mytable x 
inner join 
	-- dataset to evaluate... 
	(select distinct a.field1, a.field2, a.field3, a.mydatefield, a.mytableid 
		from mytable a 
		inner join mytable b on 
		-- duplicate details...
			a.field1 = b.field1
			and a.field2 = b.field2
			and a.field3 = b.field3
			and a.mydatefield = b.mydatefield
		-- distinguishing details...
		where a.mytableid > b.mytableid
	) as t
	on 
	-- join to match records...
	x.field1 = t.field1
	and x.field2 = t.field2
	and x.field3 = t.field3
	and x.mydatefield = t.mydatefield
	and x.mytableid = t.mytableid

step [3] delete duplicates
--delete version... 

delete mytable  
from mytable x 
inner join
	-- dataset to evaluate...
	(select distinct a.field1, a.field2, a.field3, a.mydatefield, a.mytableid 
		from mytable a 
		inner join mytable b on 
		-- duplicate details...
			a.field1 = b.field1
			and a.field2 = b.field2
			and a.field3 = b.field3
			and a.mydatefield = b.mydatefield
		-- distinguishing details...
		where a.mytableid > b.mytableid
	) as t
	on 
	-- join to match records...
	x.field1 = t.field1
	and x.field2 = t.field2
	and x.field3 = t.field3
	and x.mydatefield = t.mydatefield
	and x.mytableid = t.mytableid



no distinguishing rows elements?? just add one.
ALTER TABLE policyreport ADD 
/* Add a PRIMARY KEY identity column. */ 
[ID]  INT IDENTITY
CONSTRAINT [ID2] PRIMARY KEY
GO





Comments

Post a comment
Name:


Email Address:


URL:


Comments: