Excel/Open Office to SQLite

List overview All Threads
Download

newer

older

(OT) CSS Question, was Re: CSS...

Re: [ALUG] Bring Your Own Server

Chris Walker

12 Jun 2011 12 Jun '11

11:14 a.m.

I have a spreadsheet which is currently in OpenOffice ods format but obviously can easily be saved in another format should I need to.

The spreadsheet holds roughly 6200 rows with 8 columns. I'd like to import 6 of those columns into an SQLite database and more to the point, do it automatically/programmatically rather than by hand.

There is another difficulty (it's difficult for me, it might not be for you!) is that I want the tables to be created from the names in column 3. Some of those columns are numerical and SQLite doesn't appear to like that.

The columns look like this :- Name1 Number Name2 Y/N Where Notes Unwanted1 & Unwanted2

For Name2, the rows repeat so there might be half a dozen of the same names, like 1648, 8Sail, Abbeydale against the rows in Name1.

Is there somehow I can run a program, Perl, Python or whatever to extract the data and automatically squirt it into a database?

I've searched online for something to help and did grab a Python script but it doesn't work for me and as my Python skills, like all my other programming skills amount to zero, I'm stuck, hence the question here.

I'd appreciate any help or pointers you could give.

-- __ __| |_ __ __ .----------------------------------------------. / _/ _` \ V V / | mailto:cdw_alug@the-walker-household.co.uk | ____,_|_/_/ |______________________________________________|

Show replies by date

ted.harding＠wlandres.net

12 Jun 12 Jun

11:52 a.m.

On 12-Jun-11 11:14:10, Chris Walker wrote:

...

I have a spreadsheet which is currently in OpenOffice ods format but obviously can easily be saved in another format should I need to.

The spreadsheet holds roughly 6200 rows with 8 columns. I'd like to import 6 of those columns into an SQLite database and more to the point, do it automatically/programmatically rather than by hand.

There is another difficulty (it's difficult for me, it might not be for you!) is that I want the tables to be created from the names in column 3. Some of those columns are numerical and SQLite doesn't appear to like that.

The columns look like this :- Name1 Number Name2 Y/N Where Notes Unwanted1 & Unwanted2

For Name2, the rows repeat so there might be half a dozen of the same names, like 1648, 8Sail, Abbeydale against the rows in Name1.

Is there somehow I can run a program, Perl, Python or whatever to extract the data and automatically squirt it into a database?

I've searched online for something to help and did grab a Python script but it doesn't work for me and as my Python skills, like all my other programming skills amount to zero, I'm stuck, hence the question here.

I'd appreciate any help or pointers you could give.

Can you elaborate on "I want the tables to be created from the names in column 3"?

Basically, this looks as though it would be a smooth little job using 'awk', with some backup from a simple shell wrapper, but the details would depend on how you want the extraction to come out at the end of the day. An example, maybe?

Ted.

-------------------------------------------------------------------- E-Mail: (Ted Harding) ted.harding@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 12-Jun-11 Time: 12:52:03 ------------------------------ XFMail ------------------------------

Chris Walker

13 Jun 13 Jun

8:59 a.m.

On 12/06/11 12:52, (Ted Harding) wrote:

...

On 12-Jun-11 11:14:10, Chris Walker wrote:

...
I have a spreadsheet which is currently in OpenOffice ods format but obviously can easily be saved in another format should I need to.

The spreadsheet holds roughly 6200 rows with 8 columns. I'd like to import 6 of those columns into an SQLite database and more to the point, do it automatically/programmatically rather than by hand.

There is another difficulty (it's difficult for me, it might not be for you!) is that I want the tables to be created from the names in column 3. Some of those columns are numerical and SQLite doesn't appear to like that.

The columns look like this :- Name1 Number Name2 Y/N Where Notes Unwanted1& Unwanted2

For Name2, the rows repeat so there might be half a dozen of the same names, like 1648, 8Sail, Abbeydale against the rows in Name1.

[snip]

...

Can you elaborate on "I want the tables to be created from the names in column 3"?

This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.

I want the table to be called the Brewery Name e.g. Woodfordes.

...

Basically, this looks as though it would be a smooth little job using 'awk', with some backup from a simple shell wrapper, but the details would depend on how you want the extraction to come out at the end of the day. An example, maybe?

I hope my example above illustrates what I want to do.

-- __ __| |_ __ __ .----------------------------------------------. / _/ _` \ V V / | mailto:cdw_alug@the-walker-household.co.uk | ____,_|_/_/ |______________________________________________|

richard.lee.parsons＠gmail.com

9:20 a.m.

Chris Walker cdw_alug@the-walker-household.co.uk wrote:

...

...
Can you elaborate on "I want the tables to be created from the names in column 3"?

This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.

So you want multiple tables, with each tables name being the Brewery Name?

Why not have one big table, with the Brewery Name as a column?

Richard

Chris Walker

9:25 a.m.

On 13/06/11 10:20, Richard Parsons wrote:

...

Chris Walkercdw_alug@the-walker-household.co.uk wrote:

...
...
Can you elaborate on "I want the tables to be created from the names in column 3"?

This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.

So you want multiple tables, with each tables name being the Brewery Name?

Why not have one big table, with the Brewery Name as a column?

I already have a big table of Brewery Names. I then want to be able to click on the Brewery Name and have SQL display a list of their beers.

Here I want to extract a list of the beers (plus other data) from the OO spreadsheet.

-- __ __| |_ __ __ .----------------------------------------------. / _/ _` \ V V / | mailto:cdw_alug@the-walker-household.co.uk | ____,_|_/_/ |______________________________________________|

richard.lee.parsons＠gmail.com

9:39 a.m.

Chris Walker cdw_alug@the-walker-household.co.uk wrote:

...

On 13/06/11 10:20, Richard Parsons wrote:

...
Chris Walkercdw_alug@the-walker-household.co.uk wrote:

...
...
Can you elaborate on "I want the tables to be created from the names in column 3"?

This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.

So you want multiple tables, with each tables name being the Brewery Name?

Why not have one big table, with the Brewery Name as a column?

I already have a big table of Brewery Names. I then want to be able to click on the Brewery Name and have SQL display a list of their beers.

Here I want to extract a list of the beers (plus other data) from the OO spreadsheet.

Maybe, you need to re-think your proposed database structure. I'm no expert, but when you add data (like another Brewery Name) to a database you shouldn't need to add a new table. Have a look at this article which I think may help: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

Thanks Richard

Chris Walker

9:47 a.m.

On 13/06/11 10:39, Richard Parsons wrote:

...

Maybe, you need to re-think your proposed database structure. I'm no expert, but when you add data (like another Brewery Name) to a database you shouldn't need to add a new table. Have a look at this article which I think may help: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

You probably hit the nail on the head with your first comment :-)

I'll take a look at the site. Thanks for that.

-- __ __| |_ __ __ .----------------------------------------------. / _/ _` \ V V / | mailto:cdw_alug@the-walker-household.co.uk | ____,_|_/_/ |______________________________________________|

Brett Parker

10:06 a.m.

On 13 Jun 10:39, Richard Parsons wrote:

...

Chris Walker cdw_alug@the-walker-household.co.uk wrote:

...
On 13/06/11 10:20, Richard Parsons wrote:

...
Chris Walkercdw_alug@the-walker-household.co.uk wrote:

...
...
Can you elaborate on "I want the tables to be created from the names in column 3"?

This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.

So you want multiple tables, with each tables name being the Brewery Name?

Why not have one big table, with the Brewery Name as a column?

I already have a big table of Brewery Names. I then want to be able to click on the Brewery Name and have SQL display a list of their beers.

Here I want to extract a list of the beers (plus other data) from the OO spreadsheet.

Maybe, you need to re-think your proposed database structure. I'm no expert, but when you add data (like another Brewery Name) to a database you shouldn't need to add a new table. Have a look at this article which I think may help: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

So, what I'd be looking at is:

Table : Brewery Columns: id name description

Table : Beer Columns: id brewery_id name description

Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.

This is normal database normalisation.

Cheers,

-- Brett Parker

richard.lee.parsons＠gmail.com

10:37 a.m.

Brett Parker iDunno@sommitrealweird.co.uk wrote:

...

So, what I'd be looking at is:

Table  : Brewery
Columns: id
         name
         description

Table  : Beer
Columns: id
         brewery_id
         name
         description

That looks much better to me. Now you need to write some sql that creates that database. Have a look at the sqlite documentation for that.

...

Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.

This is normal database normalisation.

That's a mouthful. :)

Now back to your original question. Can you convert the spreadsheet to csv? That will make it easier to parse. Perhaps then give the first few lines of that csv, so we can help with parsing it.

Also, which scripting language do you want to do this in? Bash is probably easiest for me. Python will do it easily enough. Perl can do it, but I don't know perl (others on the list probably do). Someone suggested awk: that would probably be neater that bash/python, but I'm sure not as experienced in it. I'm sure there's lots more options (ruby etc).

Richard

Brett Parker

11:02 a.m.

On 13 Jun 11:37, Richard Parsons wrote:

...

Brett Parker iDunno@sommitrealweird.co.uk wrote:

...
So, what I'd be looking at is:
Table  : Brewery
Columns: id
         name
         description

Table  : Beer
Columns: id
         brewery_id
         name
         description
That looks much better to me. Now you need to write some sql that creates that database. Have a look at the sqlite documentation for that.

...
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.

This is normal database normalisation.

That's a mouthful. :)

Now back to your original question. Can you convert the spreadsheet to csv? That will make it easier to parse. Perhaps then give the first few lines of that csv, so we can help with parsing it.

Also, which scripting language do you want to do this in? Bash is probably easiest for me. Python will do it easily enough. Perl can do it, but I don't know perl (others on the list probably do). Someone suggested awk: that would probably be neater that bash/python, but I'm sure not as experienced in it. I'm sure there's lots more options (ruby etc).

I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser, and reinvention of wheels is bad :)

It appears that there are python modules for interactive with openoffice, but I've not played with them yet, so I can't recommend them. Certainly previously when faced with a similar situation I've gone the csv and python route because it was the path of least resistance (and I was in a hurry...)

-- Brett Parker

ted.harding＠wlandres.net

11:35 a.m.

On 13-Jun-11 11:02:16, Brett Parker wrote:

...

On 13 Jun 11:37, Richard Parsons wrote:

...
Brett Parker iDunno@sommitrealweird.co.uk wrote:

...
So, what I'd be looking at is:
Table  : Brewery
Columns: id
         name
         description

Table  : Beer
Columns: id
         brewery_id
         name
         description
That looks much better to me. Now you need to write some sql that creates that database. Have a look at the sqlite documentation for that.

...
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.

This is normal database normalisation.

That's a mouthful. :)

Now back to your original question. Can you convert the spreadsheet to csv? That will make it easier to parse. Perhaps then give the first few lines of that csv, so we can help with parsing it.

Also, which scripting language do you want to do this in? Bash is probably easiest for me. Python will do it easily enough. Perl can do it, but I don't know perl (others on the list probably do). Someone suggested awk: that would probably be neater that bash/python, but I'm sure not as experienced in it. I'm sure there's lots more options (ruby etc).
I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser,

Err, I respectfully disagree ... (in other words, b*****ks)!

Copy the following into a console:

awk 'BEGIN{FS=","; i=0}; {i=i+1; print $i}' << EOT Row 1,Field 1.1,Field 1.2,Field 1.3 Row 2,Field 2.1,Field 2.2,Field 2.3 Row 3,Field 3.1,Field 3.2,Field 3.3 Row 4,Field 4.1,Field 4.2,Field 4.3 EOT

and you will get:

Row 1 Field 2.1 Field 3.2 Field 4.3

(printing field i=1 from line 1, field i=2 from line 2, ... ).

So awk has parsed the CSV! The trick is to set the Field Separator (FS=",") to comma. After that you can do what you like.

Ted.

...

and reinvention of wheels is bad :)

It appears that there are python modules for interactive with openoffice, but I've not played with them yet, so I can't recommend them. Certainly previously when faced with a similar situation I've gone the csv and python route because it was the path of least resistance (and I was in a hurry...) -- Brett Parker

-------------------------------------------------------------------- E-Mail: (Ted Harding) ted.harding@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 13-Jun-11 Time: 12:35:30 ------------------------------ XFMail ------------------------------

Brett Parker

11:54 a.m.

On 13 Jun 12:35, Ted Harding wrote:

...

On 13-Jun-11 11:02:16, Brett Parker wrote:

...
I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser,

Err, I respectfully disagree ... (in other words, b*****ks)!

Copy the following into a console:

awk 'BEGIN{FS=","; i=0}; {i=i+1; print $i}' << EOT Row 1,Field 1.1,Field 1.2,Field 1.3 Row 2,Field 2.1,Field 2.2,Field 2.3 Row 3,Field 3.1,Field 3.2,Field 3.3 Row 4,Field 4.1,Field 4.2,Field 4.3 EOT

and you will get:

Row 1 Field 2.1 Field 3.2 Field 4.3

(printing field i=1 from line 1, field i=2 from line 2, ... ).

So awk has parsed the CSV! The trick is to set the Field Separator (FS=",") to comma. After that you can do what you like.

I respectfully call you a fucking moron and ask you what happens if your csv does something like:

"Row 1","This is a description, it's quite long and contains commas",97 "Row 2","This is a different description without commas",12

Now, please, ffs, learn that just because it's comma seperated doesn't mean that the field doesn't contain commas. The number of times I've sat and had to debug someone elses shitty code because rather than using a csv library that's been tested they decided that they could just split on commas is no longer funny. And when people turn up on a list with a lot of technical people on it suggesting doing just this it makes me Very Angry. And you don't want to see me angry.

-- Brett Parker

Chris Walker

12:57 p.m.

On 13/06/11 12:54, Brett Parker wrote:

...

On 13 Jun 12:35, Ted Harding wrote:

...
On 13-Jun-11 11:02:16, Brett Parker wrote:

...
I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser,

Err, I respectfully disagree ... (in other words, b*****ks)!

[snip]

...

I respectfully call you a fucking moron and ask you what happens if your csv does something like:

People, please. I'm glad I started this now!

...

on commas is no longer funny. And when people turn up on a list with a lot of technical people on it suggesting doing just this it makes me Very Angry. And you don't want to see me angry.

Do you turn green? ;-)

I always preferred TSV or fixed width data rather than CSV because of the possibility of the data containing commas. Mine does here.

I'm away from the keyboard this afternoon and when I return I'll be looking at the website that Richard mentioned.

While I'm away, can you play nicely please?

-- __ __| |_ __ __ .----------------------------------------------. / _/ _` \ V V / | mailto:cdw_alug@the-walker-household.co.uk | ____,_|_/_/ |______________________________________________|

Brett Parker

1:18 p.m.

On 13 Jun 13:57, Chris Walker wrote:

...

On 13/06/11 12:54, Brett Parker wrote:

...
On 13 Jun 12:35, Ted Harding wrote:

...
On 13-Jun-11 11:02:16, Brett Parker wrote:

...
I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser,

Err, I respectfully disagree ... (in other words, b*****ks)!

[snip]

...
I respectfully call you a fucking moron and ask you what happens if your csv does something like:

People, please. I'm glad I started this now!

Well, you know, I take exception to people telling me "b*****ks" and give an example that fails on a trivial dataset...

...

...
on commas is no longer funny. And when people turn up on a list with a lot of technical people on it suggesting doing just this it makes me Very Angry. And you don't want to see me angry.

Do you turn green? ;-)

I always preferred TSV or fixed width data rather than CSV because of the possibility of the data containing commas. Mine does here.

Same problem, content can contain tabs... this is why the CSV spec ain't small, and actually covers many different delimeters, and what to do with them...

And as you've just proven my point (the content does contain commas...) I think it was a justified response :)

...

I'm away from the keyboard this afternoon and when I return I'll be looking at the website that Richard mentioned.

While I'm away, can you play nicely please?

I always play nice. It just might not seem that way occasionally... I'm saving hours of debugging and trying to work out "why did that break like that?" when given a non-trivial dataset.

-- Brett Parker

Chris Walker

12:50 p.m.

On 13/06/11 11:37, Richard Parsons wrote:

...

Brett ParkeriDunno@sommitrealweird.co.uk wrote:

...
So, what I'd be looking at is:
 Table  : Brewery
 Columns: id
          name
          description

 Table  : Beer
 Columns: id
          brewery_id
          name
          description
That looks much better to me. Now you need to write some sql that creates that database. Have a look at the sqlite documentation for that.

Writing the code would be a problem for me, so, I cheat! I use this - http://sourceforge.net/projects/sqlitebrowser/ under WindowsXP running in VirtualBox.

...

...
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.

This is normal database normalisation.

That's a mouthful. :)

Now back to your original question. Can you convert the spreadsheet to csv? That will make it easier to parse. Perhaps then give the first few lines of that csv, so we can help with parsing it.

Yes I can, but having seen the 'discussion' that erupted afterwards, I'd say TSV is preferred. Like others, I spent ages trying to fix data import for a certain large insurance company for mailshots.

...

Also, which scripting language do you want to do this in? Bash is probably easiest for me. Python will do it easily enough. Perl can do it, but I don't know perl (others on the list probably do). Someone suggested awk: that would probably be neater that bash/python, but I'm sure not as experienced in it. I'm sure there's lots more options (ruby etc).

I *might* be able to cope with Python. I have no experience of Perl but Python having similar structures to BBC Basic - my only programming experience - I'd be prepared to give that a go.

-- __ __| |_ __ __ .----------------------------------------------. / _/ _` \ V V / | mailto:cdw_alug@the-walker-household.co.uk | ____,_|_/_/ |______________________________________________|

Chris Walker

15 Jun 15 Jun

11:33 a.m.

On 13/06/11 11:06, Brett Parker wrote:

...

On 13 Jun 10:39, Richard Parsons wrote:

...

...
Maybe, you need to re-think your proposed database structure. I'm no expert, but when you add data (like another Brewery Name) to a database you shouldn't need to add a new table. Have a look at this article which I think may help: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

So, what I'd be looking at is:
 Table  : Brewery
 Columns: id
          name
          description

 Table  : Beer
 Columns: id
          brewery_id
          name
          description
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.

This is normal database normalisation.

I've created a couple of tables. This what they look like (csv output from the creator program) Brewery :- "ID","Name","Web_URL","Address"

ID is the 'TEXT_PRIMARY_KEY' and the rest of the fields are set as text.

Beers :- "ID","Beer_Name","Brewery_ID","ABV","Consumed_Yes_No","Where_Consumed","Tasting_Notes"

Again, ID is the 'TEXT_PRIMARY_KEY' and the rest of the fields are set as text. I wondered if I should have set Consumed_Yes_No as a BOOL element but I can't see how to do that in the program I'm using. Is SQL itself better at that or is it unnecessary?

So the first real question is, have I done that correctly? I wondered if I need an ID in the Beers table or could the Beer_Name be used as that? I realise though that lots of beers have the same name and the web site cautions against that.

Second question. Do I link the relationship between the tables in SQL code or in the program code that will make use of this data? I can see how to do the latter from this section of code :-

model->setTable("employee");

model->setRelation(2, QSqlRelation("city", "id", "name")); model->setRelation(3, QSqlRelation("country", "id", "name"));

The code is taken from one of the examples used here - http://doc.qt.nokia.com/4.7-snapshot/qsqlrelationaltablemodel.html

-- __ __| |_ __ __ .----------------------------------------------. / _/ _` \ V V / | mailto:cdw_alug@the-walker-household.co.uk | ____,_|_/_/ |______________________________________________|

Chris Walker

13 Jun 13 Jun

9:58 a.m.

On 13/06/11 10:20, Richard Parsons wrote:

...

Chris Walkercdw_alug@the-walker-household.co.uk wrote:

...
...
Can you elaborate on "I want the tables to be created from the names in column 3"?

This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.

So you want multiple tables, with each tables name being the Brewery Name?

That was my initial thought but I'm wading through the advice on the site you gave and now I'm far from sure that I'm doing things the right way at all.

...

Why not have one big table, with the Brewery Name as a column?

Give me a little while to digest the info on the web site, and I'll come back with some ideas, based on what I *think* I've learnt.

-- __ __| |_ __ __ .----------------------------------------------. / _/ _` \ V V / | mailto:cdw_alug@the-walker-household.co.uk | ____,_|_/_/ |______________________________________________|

5131

Age (days ago)

5134

Last active (days ago)

main@lists.alug.org.uk

16 comments

4 participants

tags (0)

participants (4)

Brett Parker
Chris Walker
richard.lee.parsons＠gmail.com
ted.harding＠wlandres.net