forked from JakeWheat/hssqlppp
-
Notifications
You must be signed in to change notification settings - Fork 0
/
todo
386 lines (297 loc) · 14 KB
/
todo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
State of the code:
There are some big issues
1. the dialect handling for parsing is very limited
2. schema support and general catalog operations are limited
3. typechecking of query exprs is limited, and other dml and ddl is
non-existent
4. there are lots of postgres specific things baked into the system
5. the typeconversion code, which handles overloaded function
resolution, implicit casting, most of the typing of literals, and
most of determining the precision, scale and nuillability of
expressions is a complete nightmare
6. typechecking for ansi, sql server and oracle dialects are very
limited and sql server and oracle are very incorrect
7. the tests are very limited
8. a lot of the parsing code is very crufty and/or suspect
9. the syntax design is a bit poor
10. some of the code is too coupled (especially between modules)
Immediate in progress/provisional soon to be started tasks
new catalog/ new typeid
includes a bunch of feature updates:
proper schemas
proper unquoted id handling
character sets and collations
more checking on catalog updates/ddl + nice error messages
(will be linked with ddl syntax eventually)
restrict and cascade behaviour
roles and ownership (will be extended to permissions eventually)
integrate new catalog
better tests, complete catalogs for ansi and sql server
better tests for matching, nullability, precision, literals
copy syntax and parsing from simple-sql-parser
Long term tasks
0. minor issues
put the special operators like between and substring in a separate
namespace in the catalog and typechecking so they can co-exist with
functions of the same name
implicit case uses type. The syntax should not depend on this, maybe
replace with something just specific to implicit cast
full dialect type is used all the way through the internals. it is
nice to have a single dialect type for the user, but to reduce couple
internally, there should be minimal dialect types for each of
lexing,parsing,pretty printing and typechecking. Maybe the
lexing,parsing and pretty printing will be the same type, but don't
want the parser to depend on all the code that the typechecking part
of the dialect depends on
the annotation type pulls in a bunch of stuff into the syntax. can the
tree be parameterized on the annotation type? can the annotation type
be implemented in a separate module at least? it uses some ag stuff at
the moment
rename some of the internal modules, types, ctors and functions,
etc. to be more clear and consistent.
fix the catalogs and tests for oracle and sql server so they are not
based on postresql, which makes them very wrong
the dialects used in the tests are a bit confused because of legacy
issues. The tests need to be more methodical anyway.
the odbc catalog does not attach onto the current dialect
properly. the typechecker should access the odbc via the dialect and
not directly. Odbc support both for parsing and typechecking should be
a flag on the dialects or something (and disabled by default). The
odbc typechecking should work in different dialects.
literals don't directly type as unknown. This should be fixed at least
for postgres, and probably is a good idea for at least some of the
other dialects.
try to do code coverage
try to set up continuous integration somewhere - mainly want to catch
build failures with older ghcs
can also do packdeps
also: much better testing. big weak points:
we don't test nullability, precision and scale much, or the typing of literals
want some more tests on unusual dialects (e.g use numeric and have no
int types, dialect only has one text type and it isn't char or
varchar, dialect doesn't have a decimal type (affects number literals
for instance)
the parser tests are much less comprehensive that in the
simple-sql-parser project
there aren't many anomaly tests
for instance, bad typing
using sql which doesn't work in the current dialect/catalog
there are also a bunch of places where the error messages can be made much nicer
for instance:
when you don't match a function it can provide similarly named
functions, and if you got the name right, it can explain exactly
what the overloads are, what your types are, and why it didn't
match any of them (e.g. non are valid for the types, or it is
ambiguous)
can also consider the similar names for identifiers too
can the overload matching/implicit casting be tested by generating
lots of test cases using another dbms like postgres and sql server?
add more qualified imports and explicit import lists
get rid of the remaining calls to error
expose internals only via modules called ...Internal. There are a few
internal functions expose for utils or testing. These should be kept
more separate
more work with strings
1. data types
(ansi has clear restrictions/differences between char/varchar/clob
and nchar/nvarchar/nclob), postgres more or less just has text only
(I think char and varchar in postgres are now more or less sugar
around text).
2. character sets (we will treat a character set as an encoding)
3. collations
every string will carry a character set and optional collation with it
(collations have the default/implicit/explicit tag)
a dialect can specify that the character set of char,varchar,clob must
be from a subset of the character sets available, and nchar,
nvarchar is another subset, or it can allow any character set for
any string data type.
you set the default character set, plus you can set the character set
for a table, plus you can set the character set per column, and you
can convert text from one character set to another. We will
consider character sets as encodings, character repertoires will
not have explicit representation in hssqlppp (you can either
convert between two character sets or you can't, and the
availability of a conversion is always explicit and direct).
use the ansi rules for collations: each character set has a default
collation. Not sure if this can be changed, or changed per schema
for instance. Use the default/implicit/explicit collation rules for
expressions. table columns can have an explicit collation, and you
can use the collate operator in expressions (including group by and
order by).
information about:
character sets
datatype/character set compatibility
maybe each datatype can have All | Whitelist [characterset] |
Blacklist [characterset]
collations + which character set a collation is for
default collation for a character set
character sets and collations for table columns and for view
expression types
anything else (like domains, ...)
will go in the catalog
1. rewrite the catalog code
support schemas and schema search path properly
support some more object types
help fix the confusion with the type Type
support dependencies and cascade/restrict
make sure the canonicalization of names, and case handling is done
properly
correctness handling of ddl operations
support drop and alter directly
good test coverage directly on the catalog api
understand some of the dialect issues wrt catalog stuff better
handle the odbc option better - this needs some interaction with the
general catalog stuff
later want to handle permissions to some extent. not sure exactly what
to do here or if the code will be in hssqlppp or sqream. We need at
least parsing and catalog support for registering permissions here,
even if the authorization code itself might not be here
2. rewrite the typechecking code from scratch
it is a real mess
lots of things missing
it needs to be much more literate because it is weird, so that
programmers who don't know uuagc stand a chance of working with this
code
there should be much clearer way of dealing with nullability,
precision and scale
more complete typechecking for query exprs
complete typechecking for other dml and for ddl added
there are some tree rewrites which can be done during
typechecking. These should be refactored as separate passes instead of
it being tied together
the typechecking should be better as a compiler front end. The key
addition is tracking where identifiers are defined, so e.g. we can
easily tell which schema a table has matched to, or if a subquery is
correlated.
the typechecker could also check that asts don't contain syntax which
isn't supported in the current dialect
3. rewrite the typeconversion code and the general nullability,
precision and scale stuff
this is even worse mess
get rid of the hacks and special cases for e.g. datepart
I think the current code tries to be too rule based. For simple
reoccuring patterns, we can use some simple rules. Other than that, I
think passing in functions to e.g. determine the output precision is
better. The default rules and the special case functions should be
connected to the catalog functions in the dialect files right next to
the catalog definitions, instead of being embedded in the type
conversion code itself.
4. simple-sql-parser
there is a bunch of improved syntax and parsing code (and pretty
printing) in the simple-sql-parser project. there can be some copy
pasting into hssqlppp for now, and eventually they should be
synchronized and kept the same (somehow - we can't use
simple-sql-parser directly or use the source code since we must use
.ag in hssqlppp and I definitely don't want to use this in
simple-sql-parser).
5. improve the project documentation and examples
= later tasks
think about demo code to convert between dialects (especially things
like types and functions which need to be desugared)
try to get the old chaos sql preprocessor working again
improve the quasiquotation system: maybe also switch from text back to
strings (apart from the input to parsing, and the output from pretty
printing)
typesafe access and composing sql expressions from haskell
work on ansi procedural sql + get the typechecking for procedural sql
working again. Probably the most important dialect here is TSQL, but
maybe plsql is also important (the plpgsql work should be revived also
since the syntax is already there).
synchronization with simple-sql-parser
the syntaxes should be the same. Maybe there can be a mechanical
conversion from the simple-sql-parser haskell syntax to the uuagc in
hssqlppp
simple-sql-parser should have dialect support and annotations for this
to happen.
can't share the code or use simple-sql-parser from hssqlppp because
the syntax in hssqlppp must be written in ag, and simple-sql-parser it
should not use this so that it is much more accessible.
maybe replace parsec with megaparsec
maybe replace it with uu-parsinglib (has incremental parsing which is
really nice, and also maybe better error messages, and also a better
way to handle left factoring - I think it just does it automatically
more or less)
use wl-pprint-text or something - maybe this will be better for the
complex and identation heavy syntax?
do proper solution to operator precedence parsing. would like to do an
ast pass approach if possible. This is also needed at least for from
clauses and set operators too, both of which almost certainly get the
fixity wrong at the moment.
syntax extensibility?
lint for sql with plugins
parameterized annotation
------------------------
old notes
== typechecking
param query type fn
rough todo/checklist for exprs:
combinequeryexpr
values
withqueryexpr, withquery
jointref variations
join onexprs
funtrefs
table aliases
aggregates, windows
+ add checks for aggregates/group by
liftapp
inlist, inqueryexpr?
go through old typecheck tests and reinsert most of them?
-> want to get much better coverage of typechecking
start looking at getting the type errors back up to the old level
to serve effectively as a front end, the parser should produce nice
error messages, and the typechecking of crud should be very
comprehensive
what will useable non ascii character set support take? Maybe this
works fine already?
== small fixes
see if can fix error messages from lex/parse in qq: come out all with
the string passed through show so can't read the error message
better approach to parsing selects - the into stuff is a mess
alter the postprocess for astinternal.hs: want to add haddock line for
the data type and each ctor for all the data defs in the file, but
not recordize them
review examples: add new examples, clean up code, and example page on
website. Add simple parsing and typechecking example to index page
rename examples
review docs, esp. haddock which contains all sorts of shockingly bad
typos, spellings, inconsistencies, etc.
junk tests to get working: extensions, roundtrip?
want to be ready to do comprehensive review of pg syntax support for
0.6.0, so can work through and get a reasonably comprehensive list
of what is missing
documentation:
easy way to put in some sql, see if it parses, view the resultant
ast
same with typechecking: show writing a cat by hand, and examples to
generate from postgresql database and from ddl sql
== misc small bits
idea for :: cast
-> parse typenames as scalarexpressions by embeding them in
numberlits (hacky but should be unambiguous) - or antictors?
then transform to cast after parsing. This can then use the proper
precedence in buildExpressionParser to parse ::. Also produce parse
errors if after parsing, you try to cast to something which isn't a
typename
could do something similar for other operators: '.', [], in?
add support for enum types
== website nav
== examples
add compilation of examples to automated tests, also add tests in the
documentation
== report generator
the idea is to have the following for experimentation, evaluate how
well hssqlppp supports some existing sql, support while developing
sql (possibly with syntax extensions), and generating
documentation:
take source sql:
standard postgresql sql in text files
sql taken from postgresql dump from live db
syntax extended sql in text files
do some or all of the following:
parse and type check - report problems
parse, pretty print, reparse and check
generate documentation, catalog
load into postgresql and double check catalog from typechecker
load and dump postgresql, reparse and typecheck for changes
== documentation generator for sql codebases