Any D-BASE Guru's?

rainjacktx

Active member
I have 27,500 records in a .csv data base. When I try to import the database into my website, it is rejected because there are duplicate email addresses.

Right now, I am using excel, and have the rows sorted by email address and am scrolling through and manually identifying and removing duplicate email records. Other than going through line-by-line, is there some kind of sort command in excel that will find all of the duplicates?

I also have MS Access if I need it. I'm am not at all familiar with it, though.

Anyone? Anyone?
 

rainjacktx

Active member
smokeyjoe53":1mghj9xb said:
Try sorting by email address. At least that way you'll have all the duplicates together. Then you can go back and resort.

I have it sorted by email address already. Been working on it for an hour, and I'm only 1,000 records in. At this rate, it will be sometime Wednesday night before I get through the whole list.

I do not possess patience of that magnitude.
 

granger

Founder
Do you have MS Access? There are some tricks to it that allow you do get uniques.

But first I would make sure the Unique keys are disabled for the e-mail address field. Make sure your primary key is something that can be unique.
 

HisTeam

Member
Maybe I am missing something here, but could you not put a column as COL A and put an index number in as a formula and then import as unique records?
 

rainjacktx

Active member
HisTeam":10gdsj6k said:
Maybe I am missing something here, but could you not put a column as COL A and put an index number in as a formula and then import as unique records?

I need the email addresses. The old site allowed for duplicate emails because username and customer number (autogenerated by website) was unique. The new site keys on email addresses at import, then assigns customer number. So email addy's must be unique.
 

rainjacktx

Active member
granger":3tpfivke said:
Do you have MS Access? There are some tricks to it that allow you do get uniques.

But first I would make sure the Unique keys are disabled for the e-mail address field. Make sure your primary key is something that can be unique.

I have Access, but it is from 2000. And I am not very good at Access. Like driving in England and trying to shift gears with your left hand. I know what I need to do - but that's about as far as I can get.
 

granger

Founder
If e-mails is set to primary key, then that's why it won't import them.

Then I would import excel into Access. create a new table with same value and import with e-mails as primary key. It will then prompt to continue or lose the dups during import… and it is done.
 

granger

Founder
another way would be to create another table with the PK as a number field you add (like HisTeam said), then do a query asking for unique and then import this into the table
 

rainjacktx

Active member
granger":260gibh0 said:
another way would be to create another table with the PK as a number field you add (like HisTeam said), then do a query asking for unique and then import this into the table

So I need to set a primary key from a field that I know is unique - then use that key to remove the duplicates in the email address field?

I'm gonna wind up catching something on fire.
 

granger

Founder
Maybe I am missing something now. I thought you said the e-mails were NOT unique at the very top.

If you do not have access to the actual db, then I would try the access method. It is pretty easy.
Import table with data
delete the data from table
set emails as primary
import data to table
let it override and import
only one per email will be imported
 

rainjacktx

Active member
granger":38u9yyb0 said:
Maybe I am missing something now. I thought you said the e-mails were NOT unique at the very top.

If you do not have access to the actual db, then I would try the access method. It is pretty easy.
Import table with data
delete the data from table
set emails as primary
import data to table
let it override and import
only one per email will be imported

I have exported all the customer data from the old site. I will call it the old database. In that old database, email addy's are not unique, nor are they used as any sort of key. There are some folks who have entered the exact same email address 4 different times over the last 8 years because they forgot their username, or password, and started new accounts with the original email address.

In order to import our existing customer files into the NEW website, I am having to rename fields and delete extraneous data in the old data base. The new site will only import unique email addresses. So I am have to go through the old database and manually delete every record with an email address that shows up more than once. For example - if there are two records that share the same email address, I need to delete one of them and keep the other. If there are three, I need to delete 2, etc., etc., etc.

I will try your import table with data/delete data method.

One additional question - well, two -

1. I know nothing about access. How do I set a field as the primary key?

2. If I want to use the most recently used account associated with more than one email address, how might I do that? There is a field for last login date and time.
 

rainjacktx

Active member
granger":2j6q9ykx said:
I posted a video on how to do it. It is uploading and should be online in a few minutes…

http://www.sixmanfootball.com/stuff/avideo.mov

I realized I only have 2013 access on my computer, running in windows emulation mode, but the commands and thoughts are essentially the same...


Thanks, Granger. I got it to work. Your advice saved me about 27 hours of line-by-line editing. Outed over 2200 duplicate emails.

Still a bunch of trash records. For some reason, though - and I don't know what or who the cuplrit is - seems that Access mistook the 'dot' in 'dot' com or 'dot' whatever as a field separator so I had to delete about 50 records because they were mangled beyond saving.

I started with over 27700 records, and am going to try and import about 25500 tonight. Huge time savings. Thanks again.
 

granger

Founder
no worries. sorry about the other access issues.

someone taught me that trick many years ago. We used to use it for mailing addresses. You start the same way, create a new field Big_Address that combines address1&address2&zip
delete the data
set Big_Address as the key
then import the data again.

It flushes out all of the multiple addresses. In our case it was always family members who all had accounts and we didn't want to waste postage on sending it to all of them. Of course this assumes they typed it all in the exact same way. Now there are tricks to cut whitespace, make it all lowercase, etc., but you still get Park Hill Dr and Park HIll Drive...
 

rainjacktx

Active member
granger":vsfbctzd said:
no worries. sorry about the other access issues.

someone taught me that trick many years ago. We used to use it for mailing addresses. You start the same way, create a new field Big_Address that combines address1&address2&zip
delete the data
set Big_Address as the key
then import the data again.

It flushes out all of the multiple addresses. In our case it was always family members who all had accounts and we didn't want to waste postage on sending it to all of them. Of course this assumes they typed it all in the exact same way. Now there are tricks to cut whitespace, make it all lowercase, etc., but you still get Park Hill Dr and Park HIll Drive...

Seems to have worked for email addresses, too. 3MB, almost 4 of customer data. I clicked on the 'upload' button on the website. I have the tab minimized, and refuse to look at it out of fear of yet another error message

It's a Volusion site, which advertises here. Make sure you get your commission from them or google, or whoever it is.
 
Top