# 1 Structure of the dataset

BACI provides yearly data on bilateral trade flows at the product level. Products are identified using the Harmonized System (HS), which is the standard nomenclature for international trade, used by most customs. The Harmonized System was revised in 1992, 1996, 2002, 2007, 2012 and 2017, and we provide BACI in each of those 6 revisions:

HS revision Years available Name of the files
92 1995-2018 BACI_HS92_Yyear_Vversion.csv
96 1996-2018 BACI_HS96_Yyear_Vversion.csv
02 2002-2018 BACI_HS02_Yyear_Vversion.csv
07 2007-2018 BACI_HS07_Yyear_Vversion.csv
12 2012-2018 BACI_HS12_Yyear_Vversion.csv
17 2018-2018 BACI_HS12_Yyear_Vversion.csv

Each version of BACI is identified by the year and the month of its release, under the form YYYYMM (202001 for the January 2020 release, for instance)

year identifies the year during which the recorded trade flows took place.

Each trade flow within BACI is characterized by a combination exporter-importer-product-year. We provide the value and the quantity of the flow. Therefore, BACI contains 6 variables:

Variable Description
t Year
k Product category (HS 6-digit code)
i Exporter (ISO 3-digit country code)
j Importer (ISO 3-digit country code)
v Value of the trade flow (in thousands current USD)
q Quantity (in metric tons)

All files are in CSV format, using commas as field delimiters, and dots as decimal separators. When reading the data, we advise you not to treat the product code (k) variable as numeric, this would remove the leading zeros of the HS codes.

To save space, only the strictly positive trade flows are recorded in BACI. Additionnal files are provided to help users decide whether a flow not appearing in BACI corresponds to a zero trade flow, or a flow for which no information is available.

Trade flows whose value does not exceed 1000 USD do not appear in BACI.

In addition to the core BACI files, we provide four additional set of files that may be useful to BACI users:

Name Function
country_codes Associates the ISO 3-digit country codes to country names
product_codes Associates the HS 6-digit product codes to product names
zeros Helps determine whether a flow absent from BACI is a zero trade flow or a missing value
reporter_reliability Documents the reliability of trade declarations of each country

## 2.1 Country codes

These files associate the ISO 3-digit numeric codes used in BACI with country full names and with other versions of the ISO codes (3-letter and 2-letter). They were constructed based on the metadata provided by Comtrade.

## 2.2 Product codes

These files contain lists of the product codes used in each revision of the Harmonized System, along with a description of each product. They were constructed based on the metadata provided by Comtrade.

## 2.3 Zeros

These files indicate whether observations that are not in BACI should be treated as zero trade flows or as missing values. Indeed, to save space, BACI records only strictly positive trade flows. So if a trade flow $$ijkt$$ does not appear in BACI, it can mean either that there is no information on this trade flow (missing value), or that the trade flow is zero. The files contain 5 variables:

Variable Description
t Year
i Exporter ISO 3-digit numeric code
j Importer ISO 3-digit numeric code
ztf1 Zero Trade Flow dummy : takes value 1 if $$i$$ or $$j$$ have at least one non zero trade flow during the year (imports or exports), 0 otherwise
ztf2 Alternative Zero Trade Flow dummy : takes value 1 if the dyad $$ij$$ has at least one non zero bilateral trade flow during the year, 0 otherwise

If the database indicates that ztf1 (or ztf2) take value 1 for a given $$tij$$, this suggests that all products $$k$$ exported by $$i$$ to $$j$$ at year $$t$$ for which no information is provided in BACI are zero trade flows. If ztf1 (or ztf2) take value 0, then a flow missing from BACI is likely not to correspond to a zero trade flow, but to an absence of information on this flow.

## 2.4 Reporter reliability

These files provide data on the reliability of each country when reporting exports and imports, in terms of value, quantity, and unit value. These figures correspond to $$\sigma$$ in the companion working paper (p. 18).

Variable Description
c Country ISO 3-digit numeric code
v_unreliability_i Unreliability of country $$c$$ for export values
v_unreliability_j Unreliability of country $$c$$ for import values
q_unreliability_i Unreliability of country $$c$$ for export quantities
q_unreliability_j Unreliability of country $$c$$ for import quantities

# 3 Descriptive statistics

Aggregate value: This is the yearly aggregate value of trade flows recorded in the 2020 version of BACI, in the 1992 HS nomenclature.

Aggregate quantity: Yearly aggregate quantity traded.

Number of dyads ($$ij$$) : This is the number of distinct combinations importer-exporter with at least one non-zero trade flow in BACI. The sharp increase up to the mid-2000s reflects improvements in the coverage of our primary sources over this period.

Number of products ($$k$$): A product is defined as a 6-digit item of the HS nomenclature, 1992 revision.

Number of dyad-products ($$ijk$$): This is the number of distinct combinations importer-exporter-product with at least one non-zero trade flow in BACI.

# 4 Countries included in BACI

The countries used in BACI are inherited from the UN trade data (Comtrade) upon which we build the database. There are nevertheless some slight differences, because the construction of BACI rests on a reconciliation procedure that compares the declarations of exporters and importers, and therefore requires that the countries used in both declarations are the same. This is done by choosing the largest existing entity: for instance, because some countries do not report trade with Belgium and Luxembourg separately, we gather them under the ISO code corresponding to the single entity, “Belgium - Luxembourg”. The table below gathers all the countries for which Comtrade has data on trade flows after 1995 (first year of BACI), that are not residual areas (“Not Elsewhere Specified”) but are nevertheless absent from BACI.

# 5 Differences across HS revisions

More recent revisions of the HS nomenclature are available only for more recent years. Users studying old trade flows therefore have to choose an old HS revision.

Comtrade does not choose the revision in which trade flows are reported by each country. Conversion tables are used to turn the trade flows originally provided in a certain revision into older revisions.

The aggregate recorded trade value does not differ much across HS revisions.

Nevertheless, for a given revision, the number of products with non zero trade flow tends to decrease over time:

But using a more recent HS revision leads to less dyads being available:

In the end, the number of trade flows recorded in BACI does not differ much across revisions.

The aggregate quantity sometimes differ across revisions, reflecting differences in the conversion factors (to kg from other quantity units), that are determined separately for each revision.