ecnf-format.txt

﻿Notes on database entries for elliptic curves over number fields (other than Q)

#########################################
#                                       #
#  Data stored in the ec_nfcurves table #
#                                       #
#########################################

Here we decribe the format of the data text files in the ecnf-data
repository together with all the columns of the ec_nfcurves table in
the LMFDB.  The underlying principle is that all computations needed
for the table contents should be done in creating the data text files
and stored there, so that the upload process only does trivial data
manipulation and (in particular) does not need to construct any number
fields or elliptic curves in Sage; so the table contents can be
recreated quickly from te data text files, which are under revision
control.

In the postgresql database there is a table ec_curves, containing
elliptic curves over Q only, and ec_nfcurves, which contains curves
over a number field other than Q.  This document only concerns the
latter.

The table has several columns which we group for convenience here into
two parts: field keys and curve keys.  The field keys identify the
field K over which the curve E is defined, with enough detail so that
searches can be done for curves over specific fields or (for example)
over real quadratic fields.  The curve fields identify the curve and
contains various invariants and properties of the curve.  In some
cases some of these will not be known, in which case they contain None
(or its postgresql equivalent); the code which deals with this and
creates web pages must be aware of this partial data and handle it
properly.  Some columns are isogeny invariant, but are still stored
with every curve in the isogeny class. [Except for isogeny_matrix,
currently only stored with the first curve each class.]

All columns:  1 id + 4 field + 36 curve = 41 (before Oct 2020)
              1 id + 4 field + 46 curve = 51 (from   Oct 2020)

Every data text file contains the label components (field_label,
conductor_label, iso_label, number).

FIELD COLUMNS (4 columns)

field_label     text                 2.2.5.1
degree          smallint             2
signature       jsonb [int,int]      [2,0]
abs_disc        bigint               5

Definition of field_label: standard LMFDB number field label, with
four components separated by ".": degree, #real embeddings, abs value
of discriminant, index number (for when the first three do not
uniquely identify the field).

On upload the last 3 of these columns are extracted from the field
label.

CURVE COLUMNS (36 = 16+5+3+10+2 old columns) (+ 'id' column)
              (46 = 16+6+8+14+2 new columns) (+ 'id' column)

We subdivide these according to which data text file they are in but
there is no such distinction in the table.

(1) 16 columns from the curves.* files

label              text     (curve label including field)
short_label        text     (curve label excluding field)
class_label        text     (isogeny class label including field)
short_class_label  text     (isogeny class label excluding field)
conductor_label    text     (conductor label)
iso_label          text     (letter code of isogeny class)
iso_nlabel         smallint (numeric version of previous)
conductor_ideal    text     (ideal generators)
conductor_norm     bigint   (norm of conductor)
number             smallint (number of curve in isogeny class, from 1)
ainvs              text     (a-invariants)
jinv               text     (j-invariant)
equation           text     (latex string)
cm                 integer  (CM discriminant, or 0)
base_change        jsonb    (list of text labels of curves/Q)
q_curve            boolean  (Q-curve flag)

Here,

 label = "{}-{}".format(field_label, short_label)
 short_label = "{}-{}{}".format(conductor_label, iso_label, number)
 class_label = "{}-{}".format(field_label, short_class_label)
 short_class_label = "{}-{}".format(conductor_label, iso_label)

 iso_label is a lower case base 26 representation of the number of the
 isogeny class for fixed conductor (starting from 0 and with leading
 a's removed), so that class_label exactly matches the label of a
 corresponding HMF or BMF, with the following exceptions: (i) for
 curves over 3.1.23.1 there are no matching newforms and the iso_label
 is in upper case; (ii) over IQFs, curves with CM by an order in the
 same IQF have no associated BMF and the iso_label has a prefix CM
 followed by a lower case base 26 label as for non-CM curves.

 iso_nlabel is the same as an integer starting at 0 (so a=0, b=1,
 etc), except for the special CM labels where we use negative integers
 offset by 1 (so CMa=-1, CMb=-2, etc).

 conductor_labels = "{}.{}".format(norm, index)

 where index is the ideal's index, starting from 1, in a list of all
 integral ideals of the same norm.  Over IQFs the index is determined
 using the standard sorting convention.  Over TR fields the conductor
 labels just match the level labels for the corresponding HMFs.  Over
 3.1.23.1 the order was fixed by the original Gunnells-Yasaki
 computation in Magma.

 conductor_ideal is a string from which the conductor may be
 constructed as an ideal. It is a string representing [N,a,alpha]
 where N is the norm, a the smallest positive integer in the ideal,
 and alpha is a polynomial in w such that the ideal is (a,alpha) where
 the field is Q(w).  In the data files there are no embedded spaces so
 this is one data field.  The same format is used in the database for
 prime ideals and the discriminant ideal.

 Number field elements (NFelt) will be represented in the database
 either as a single NFelt-string representing a comma-separated list
 of d=[K:Q] rationals, with no embedded spaces, or as a string
 representing a polynomial in w with rational coefficients where
 K=Q(w).  The d rationals in the list are the coefficients of the
 number field element with respect to the power basis on the generator
 w of K whose minimal polynomial is stored in the number field
 database; it is the unique polredabs polynomial defining the field.

 The a-invariants are stored as a single string, being the
 concatenation of the 5 individual NFelt-strings separated by ";".

(2) 5+1=6 Columns from isoclass.* files

class_size          smallint (size of the isogeny class)
isogeny_matrix      jsonb    (matrix of isogeny degrees)
class_deg           integer  (largest isogeny degree for this class)
isodegs             integer[] (list of degrees of isogenies from this curve to all in the class)
                              NB this was isogeny_degrees of type jsonb
trace_hash          bigint   (hash of a_p for the class)
---- 1 new col Oct 2020 ----
reducible_primes    integer[] (list of prime isogeny degrees)

Here, class_size is the size n of the isogeny class of this curve;
      isogeny_matrix is a list of n lists of n integers, the (i,j)
            entry being the degree of a cyclic isogeny between curves i and
            j in this isogeny class;
      class_deg is the largest isogeny degree in the class;
      isogeny_degrees is a list of n integers, the i'th being the
            isogeny degree between this curve and the i'th curve in
            the class;
      trace_hash is the LMFDB-standard hash function for the
            L-function of this class

Note that for isogeny classes with rational CM, the degree of a cyclic
isogeny between two isogenous curves is not uniquely determined.

(3) 3+5=8 Columns from the local_data.* files

local_data   jsonb   (list of local data entries, one per prime)
non_min_p    jsonb   (list of primes at which the stored model is non-minimal)
minD         text    (minimal discriminant ideal)
---- 5 new cols Oct 2020 ----
n_bad_primes        integer (number of primes of bad reduction)
bad_primes          jsonb   (list of primes of bad reduction)
semistable          boolean (global semistable flag)
tamagawa_product    bigint  (product of Tamagawa numbers)
potential_good_reduction boolean (integral j-invariant)

The local_data list contains an entry for each prime of bad reduction
for the model of the curve stored, which is a global minimal model
when such exists, but otherwise includes a single prime of good
reduction.  non_min_p is a list of these non-minimal primes, so is
either empty or contains just one prime.  minD is the minimal
discriminant ideal.  Both the primes in the non_min_p list and minD
are strings representing ideals, as for the conductor ideal (i.e. of
the form [N,a,alpha]).

Local data format:

In the database this is a list, with one entry per prime dividing the
discriminant of the stored model, of dicts with keys as follows:

    'p':           string: the prime ideal P as [N,a,alpha]
    'normp'        integer: norm(P)
    'ord_cond'     integer: valuation of conductor at P
    'ord_disc'     integer: valuation of minimal discriminant at P
    'ord_den_j'    integer: valuation of j-invariant denominator at P
    'red'          None or integer in {0,1,-1}: reduction type
    'rootno'       integer in {1,-1}: root number
    'kod'          string: Kodaira symbol (latex)
    'cp'           integer: Tamagawa number at P

The reduction type is None for good reduction
                       0 for additive reduction
                      +1 for split multiplicative reduction
                      -1 for nonsplit multiplicative reduction

In the local_data files, the local_data field is a single string
containing substrings for each prime, delimited by ";".  Each
substring is delimited by ":" and contains the fields in the above
order.

NB If the model is a minimal model of a curve with everywhere good
reduction, the ld string will be empty and this line in the local_data
file will have fewer columns.

(4) 10+4=14 Columns from the mwdata.* files

rank                smallint
rank_bounds         jsonb    list of 2 ints
analytic_rank       smallint
ngens               smallint
gens                jsonb    list of ngens strings representing points
heights             numeric[] list of ngens reals
reg                 numeric  real
torsion_order       smallint
torsion_structure   jsonb    list of 0,1,2 ints
torsion_gens        jsonb    list of 0,1,2 strings representing points
---- 4 new cols Sept 2020 ----
torsion_primes integer[] list of >=0 ints
omega numeric real
Lvalue numeric real
sha integer

Each point is stored as a string representing a list of 3 lists of d
rationals.  These are projective coordinates, though the z-coordinate
hould aways be 1.  In the mwdata files both the gens and torsion_gens
are in single fields as lists of points e.g. [P,Q].  In the database
they are lists of strings.

(5) 2 Columns from the galdata.* files

galois_images           jsonb  list of strings
non-surjective_primes   jsonb  list of integers

The Galois image strings are Sutherland codes for Galois images when
not maximal (i.e. non-surjective for non-CM curves, not the maximal
possible image given the base field for CM curves).  e.g. '3B.1.1',
'2Cs'.  The database also contains the list of non-maximal primes (the
prefixes of the image codes).

###################################################
#                                                 #
#  Raw data files for uploading into the database #
#                                                 #
###################################################

The data is split across 5 sets of files for each field: curves.*,
isoclass.*, local_data.*, mwdata.*, galdata.* with suffix the field
label.  In each file there is one line per curve except for isoclass.*
which has one line per isogeny class.

In each file, every line starts with the following four columns

1. field_label
2. conductor_label
3. iso_label
4. number

except that in galdata.* these are combined into a single label column
{}-{}-{}{}.

In curves.* there are 10 further columns, 14 in all:

5. conductor_ideal
6. conductor_norm
7. ainvs
8. jinv
9. disc
10. normdisc
11. equation
12. cm
13. base_change
14. q_curve

In isoclass.* there are 2 further columns, 6 in all:

5. isogeny_matrix
6. trace_hash

In local_data.* there are 2 or 3 further columns, 6 or 7 in all:

5. local_data (empty=missing if the curve has egr and a global minimal model)
6. non_min_p
7. minD

In mwdata.* there are 13 further columns, 17 in all:

5. rank
6. rank_bounds
7. analytic_rank
8. ngens
9. gens
10. heights
11. reg
12. torsion_order
13. torsion_structure
14. torsion_gens
15. omega
16. lvalue
17. sha

In galdata.* the number of extra columns is 0 or more, one per
non-maximal prime (5) above

---------------------------------------------------------------------------

Schema of table ec_nfcurves

u'field_label': 'text'
u'degree': 'smallint'
u'signature': 'jsonb'
u'abs_disc': 'bigint'
u'label': 'text'
u'short_label': 'text'
u'class_label': 'text'
u'short_class_label': 'text'
u'conductor_label': 'text'
u'iso_label': 'text'
u'iso_nlabel': 'smallint'
u'conductor_ideal': 'text'
u'conductor_norm': 'bigint'
u'number': 'smallint'
u'ainvs': 'text'
u'jinv': 'text'
u'equation': 'text'
u'cm': 'integer'
u'base_change': 'jsonb'
u'q_curve': 'boolean'
u'class_size': 'smallint'
u'isogeny_matrix': 'jsonb'
u'class_deg': 'integer'
u'isogeny_degrees': 'jsonb'
u'trace_hash': 'bigint'
u'local_data': 'jsonb'
u'non_min_p': 'jsonb'
u'minD': 'text'
u'rank': 'smallint'
u'rank_bounds': 'jsonb'
u'analytic_rank': 'smallint'
u'ngens': 'smallint'
u'gens': 'jsonb'
u'heights': 'jsonb'
u'reg': 'numeric'
u'torsion_order': 'smallint'
u'torsion_structure': 'jsonb'
u'torsion_gens': 'jsonb'
u'galois_images': 'jsonb'
u'non-surjective_primes': 'jsonb'

and

u'id': 'bigint'